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Schooling  and  Labor  Market  Consequences  of  School 
Construction  in  Indonesia:  Evidence  from  an  Unusual  PoUcy 

Experiment 

Esther  Diiflo* 

Abstract 

Between  1973  and  1978,  the  Indonesian  Government  constructed  over  61,000  primary 
schools  throughout  the  coimtry.  This  is  one  of  the  largest  school  construction  programs  on 
record.  I  evaluate  the  effect  of  this  program  on  education  and  wages  by  combining  differences 
across  regions  in  the  niunber  of  schools  constructed  with  differences  across  cohorts  induced 
by  the  timing  of  the  program.  The  estimates  suggest  that  the  construction  of  primary  schools 
led  to  an  increase  in  education  and  earnings.  Children  aged  2  to  6  in  1974  received  0.12  to 
0.19  more  years  of  education  for  each  school  constructed  per  1,000  children  in  their  region 
of  birth.  Using  the  variations  in  schooUng  generated  by  this  pciicy  as  instrumental  variables 
for  the  impact  of  education  on  wages  generates  estimates  of  economic  retirrns  to  education 
ranging  from  6.8  percent  to  10.6  percent.  {JEL  12,  J31,  015,  022) 

The  question  of  whether  investment  in  infrastructure  increases  human  capital  and  reduces 
poverty  has  long  been  a  concern  to  development  economists  and  policymakers.  For  example, 
availability  of  schoohng  infrastructure  has  been  shown  to  be  positively  correlated  with  completed 
schooUng  or  enrollment  by  Nancy  Birdsall  (1985)  in  urban  Brazil,  Dennis  DeTray  (1988)  and  Lee 
A.  Lillard  and  Robert  J.  WiUis  (1994)  in  Malaysia,  Victor  Lavy  (1996)  in  Ghana,  and  Anne  Case 
and  Angus  Deaton  (1996)  in  South  Africa.  The  principal  methodological  problem  with  these 
studies  is  that  schools  are  not  randomly  allocated  across  communities.  In  education  systems 
relying  on  local  financing,  more  affluent  communities  can  afford  to  build  more  schools.  Children 
in  these  communities  are  likely  to  be  more  educated  and  earn  more  as  adults.  Alternatively, 
in  centraUzed  education  systems,  government  resources  may  be  allocated  to  regions  that  are 
lagging  behind  (as  was  the  case  with  school  construction  in  Indonesia  in  the  1970's).    As  a 


result,  education  and  wages  may  be  lower  in  the  regions  that  have  more  government  schools. 
The  ideal  experiment  to  estimate  the  effects  of  building  schools  would  be  to  allocate  schools 
randomly  to  some  communities  and  not  to  others,  and  then  to  compare  education  and  earnings 
across  communities.  In  the  absence  of  evidence  from  such  an  experiment,  it  is  necessary  to  rely 
on  exogenous  natural  variations  in  combination  with  statistical  modeUng  strategies. 

This  paper  exploits  a  dramatic  change  in  policy  to  evaluate  the  effect  of  building  schools 
on  education  and  earnings  in  Indonesia.  In  1973,  the  Indonesian  government  launched  a  major 
school  construction  program,  the  Sekolah  Dasar  INPRES  program.  Between  1973-74  and  1978- 
79  more  than  61,000  primary  school  buildings  — an  average  of  two  schools  per  1,000  children 
aged  5  to  14  in  1971 —  were  built.  The  government's  goal  was  to  increase  enrollment  rates  among 
children  aged  7  to  12  from  69  percent  in  1973  to  85  percent  by  1978.  In  1978,  the  enrollment 
rate  reached  84  percent  for  males  and  82  percent  for  females  (World  Bank  (1990a)).  Indonesia's 
primary  schooling  expansion  is  cited  by  the  World  Bank  as  "one  of  the  most  successful  cases 
of  large-scale  school  expansion  on  record"  (World  Bank  (1990a)).  This  program  represented  a 
drastic  change  in  Indonesian  policy.  Prior  to  1973,  capital  expenditures  in  the  education  sector 
were  very  low  (Ruth  Daroesman  (1971))  and  enrollment  rates  seemed  to  be  declining  in  the 
early  1970's  (Daroesman  (1971),  Ward  Heneveld  (1978)). 

The  identification  strategy  in  this  paper  uses  the  fact  that  exposure  to  the  school  construction 
program  varied  by  region  of  birth  and  date  of  birth.  Substantial  variations  existed  in  program 
intensity  across  regions,  due  to  the  government's  efforts  to  allocate  more  schools  to  regions  where 
initial  enrollment  was  low.  Therefore,  the  education  of  individuals  who  were  young  enough  to  be 
in  school  when  the  program  was  launched  should  be  higher  than  the  education  of  older  individuals 
in  all  regions,  but  the  difference  should  be  larger  in  the  regions  that  received  more  schools.  I  use 
a  difference  in  differences  estimator  that  controls  for  (additive)  systematic  variation  of  education 
both  across  regions  and  across  cohorts.  Only  the  combination  of  the  two  variations  is  treated 
as  exogenous.  Similar  strategies  are  often  used  in  the  public  finance  literature  to  evaluate  the 
effects  of  public  policies.  Mark  R.  Rosenzweig  and  Kenneth  I.  Wolpin  (1988a)  first  proposed 
using  fixed-effects  methods  for  evaluating  the  impact  of  public  programs  in  developing  countries, 
which  were  applied  in  the  Indonesian  context  by  Mark  Pitt,  Rosenzweig,  and  Donna  Gibbons 
(1993)  and  Paul  Gertler  and  John  W.  Molyneaux  (1994).  This  schooling  reform  is  particularly 
well  suited  for  fixed-effects  methods  because  the  variation  in  inputs  comes  from  a  well  defined 


reform.  This  makes  it  possible  to  test  an  implication  of  the  identification  assumption,  namely 
that  the  combination  of  the  variation  over  time  and  across  regions  is  exogenous.  I  show  that 
among  early  cohorts,  who  did  not  benefit  from  the  program  because  they  were  too  old  to  attend 
primary  school  when  it  started  (individuals  12  or  older  in  1974),  the  increase  in  educational 
attainment  from  one  cohort  to  another  is  not  correlated  with  the  number  of  INPRES  schools 
per  capita  built  from  1973  to  1978.  The  same  strategy  is  used  to  estimate  the  impact  of  this 
program  on  1995  wages.  Finally,  I  use  this  exogenous  source  of  variation  in  education  to  estimate 
the  impact  of  years  of  schooling  on  wages. 

The  question  of  whether  an  increase  in  educational  attainment  causes  an  increase  in  income 
levels  is  a  basic  concern  for  development  economists.  A  large  body  of  literature  exists  on  returns 
to  education  in  developing  countries  (see  George  Psacharopoulos  (1973,  1981,  1985,  1994))  for 
surveys).  Estimated  returns  are,  in  general,  larger  in  developing  countries  than  in  industrialized 
countries.  Surprisingly,  however,  very  little  effort  has  been  made  to  estimate  returns  to  education 
using  only  exogenous  variations  in  schoohng.  The  bias  in  estimates  that  treat  an  individual's 
education  level  as  exogenous  is  likely  to  be  important  in  developing  countries.  In  particular, 
liquidity  constraints  and  family  or  community  background  are  Hkely  to  influence  both  education 
and  earnings.  Jere  R.  Behrman's  (1990)  assessment  of  the  existing  literature  is  that  most 
standard  estimates  of  returns  to  education  in  developing  countries  are  likely  to  overstate  them. 
John  Strauss  and  Duncan  Thomas  (1995)  review  several  additional  articles  and  suggest  that 
the  evidence  is  inconclusive  and  deserves  further  study.  However,  it  is  difficult  to  find  sources 
of  exogenous  variation  in  education.  Most  factors  that  influence  education  are  also  likely  to 
have  indirect  eflFects  on  income.  This  is  clearly  true  in  the  case  of  family  background  variables 
(e.g.,  assets  and  parental  education),  which  are  often  used  as  instruments  or  included  in  the 
set  of  instruments  on  the  grounds  that  they  are  good  predictors  of  education.  If  the  concern  is 
that  unobserved  family  and  community  background  characteristics  are  sources  of  bias  in  OLS 
estimates  of  returns  to  education,  observed  family  and  community  variables  should  be  entered  as 
covariates  in  the  wage  equation  and  are  not  likely  to  be  valid  instruments.  This  is  often  also  true 
of  other  potential  instruments.  For  example,  birth  order  has  been  shown  to  affect  education. 
But  it  also  affects  health,  which  in  turn  affects  income.^  Proximity  of  parents  residence  to 
educational  facilities  has  been  used  as  an  instrument  for  college  education  in  the  United  States 
(David  Card  (1993);  Thomas  J.  Kane  and  Cecilia  E.  Rouse  (1995)),  and  years  of  secondary 


education  in  the  Philippines  (John  Maluccio  (1998)).^  These  studies  suffer  from  the  problem 
outhned  above.  Depending  on  how  schools  are  allocated,  schooling  and  wages  might  be  lower  or 
higher  in  households  living  near  or  far  away  from  a  school,  even  if  there  is  no  causal  effect  of  the 
proximity  of  a  school  on  education.  This  paper  exploits  the  exogenous  variation  in  education 
created  by  the  INPRES  program  to  construct  instrumental  variables  estimates  of  the  effect  of 
education  on  wages. 

Using  a  large  cross-section  of  men  born  between  1950  and  1972  from  the  1995  intercensal 
survey  of  Indonesia  (SUPAS),  I  linked  an  adult's  education  and  wages  with  district-level  data 
on  the  number  of  new  schools  built  between  1973-74  and  1978-79  in  his  region  of  birth.  The 
exogenous  variables  (and  the  instruments  in  the  wage  equation)  are  interactions  between  dummy 
variables  indicating  the  age  of  the  individual  in  1974  and  the  intensity  of  the  program  in  his 
region  of  birth  between  1973  and  1978.  Similar  strategies  have  been  used  to  estimate  the  effect 
of  school  quality  on  returns  to  education  (Card  and  Anne  Krueger  (1992)),  the  effect  of  teen 
fertility  on  educational  and  labor  market  outcomes  (Joshua  Angrist  and  W.N.  Evans  (1996)), 
and  the  effect  of  college  education  on  earnings  (Card  and  Thomas  Lemieux  (  1998)). 

The  remainder  of  this  paper  is  organized  as  follows.  In  section  I,  I  describe  the  INPRES 
program  and  the  data.  In  section  II,  I  present  the  identification  strategy  and  discuss  the  iden- 
tification assumption  using  a  simple  model  of  endogenous  schooling.  In  section  III,  I  present 
the  estimated  impact  of  the  program  on  education.  Section  IV  is  devoted  to  the  estimation  of 
the  effect  of  the  program  on  wages.  In  section  V,  I  estimate  economic  returns  to  education.  In 
section  VI,  I  combine  the  estimates  of  the  program  effect  on  wages  with  detailed  data  on  the  cost 
of  education  in  Indonesia  to  present  a  tentative  cost-benefit  analysis  of  the  program.  Section 
VII  concludes  the  paper. 

I.    The  Program 

A.  Data 

The  1995  intercensal  survey  of  Indonesia  is  a  sample  of  over  200,000  households.  It  is  conducted 
every  ten  years  by  the  Central  Bureau  of  Statistics  of  Indonesia.  Basic  data  is  collected  on  each 
individual  in  the  household.  In  this  study,  I  focus  on  men  born  between  1950  and  1972  (which 
ensures  that  the  individuals  in  the  sample  have  completed  their  education).  Summary  statistics 


for  this  sample  are  presented  in  table  1,  panel  A.  There  are  152,989  individuals  in  the  sample, 
with  an  average  level  of  7.98  years  of  completed  education  (6  years  of  education  correspond  to 
graduation  from  primary  school).  The  SUPAS  collects  data  on  last  month's  wage  for  people  who 
are  working  for  pay.  Prom  this,  I  calculate  an  hourly  wage  by  dividing  the  monthly  wage  by  the 
number  of  hours  worked  during  the  month.  I  estimate  the  effects  of  the  program  on  education 
using  the  complete  sample,  but  the  wage  equation  is  estimated  using  the  sample  of  individuals 
who  work  for  a  wage,  which  has  only  60,633  individuals  (sample  selection  issues  are  examined 
below) . 

The  SUPAS  asks  in  which  district  the  individual  was  born.  I  use  this  information  to  match 
the  individual  survey  data  with  district-level  data  (pertaining  to  the  situation  in  the  district 
in  the  1970's)  collected  from  various  sources:  Ministry  of  Education  and  Culture,  presidential 
instructions  pubhshed  by  the  Bappenas  (the  Planning  Agency),  and  pubUshed  results  from  the 
1971  census.^  District-level  descriptive  statistics  are  presented  in  table  1,  panel  B. 

B.    The  Sekolah  Dasar  INPRES  Program 

Since  1973,  the  "presidential  instructions"  (INPRES)  have  been  the  main  centrally  controlled 
fiscal  mechanism  determining  spatial  redistribution  of  the  aggregate  gains  to  Indonesia  from  the 
oil  boom  (Martin  Havallion  (1988)).  Over  the  years,  the  scheme  has  evolved  into  a  complex 
system  of  grants  for  various  purposes,  such  as  building  schools,  health  clinics,  and  roads,  as  well 
as  more  routine  government  spending. 

The  Sekolah  Dasar  INPRES  program  was  one  of  the  first  INPRES  programs  and  the  largest 
at  the  time  it  was  launched  (in  1973-74).  During  the  first  Indonesian  five-year  plan,  the  em- 
phasis was  on  basic  infrastructure  and  sectoral  development.  Agriculture,  industry/mining,  and 
communication  absorbed  70  percent  of  the  development  budget  (Hal  Hill  (1996)).  At  the  outset 
of  the  second  five-year  plan,  which  emphasized  the  need  for  equity,  changing  priorities  were 
evident.  Regional  development  became  an  important  item  (absorbing  15  percent  of  the  budget). 
The  Sekolah  Dasar  INPRES  program  represented,  in  turn,  12  percent  of  the  regional  develop- 
ment budget  in  1973  and  28  percent  in  1979  (for  comparison,  health  expenditures  represented 
only  3.4  percent  of  the  budget  for  regional  development  in  1973  (and  5.5  percent  in  1979)).'^  The 
budget  itself,  thanks  to  the  oil  boom,  expanded  very  rapidly  during  this  period.  Real  expendi- 
tures more  than  doubled  between  1973  and  1980  while  the  share  of  oil  in  government  revenues 


grew  from  25  percent  in  1971  to  48  percent  in  1974  and  peaked  at  62  percent  in  1981.  Due  to 
the  combination  of  these  two  factors  (change  in  priority  and  increase  in  revenues) ,  the  Sekolah 
Dasar  INPRES  program  became  extremely  important. 

Basic  data  about  the  program  is  presented  in  table  1,  panel  B.  Between  1973-74  and  1978-79, 
61,807  new  buildings  were  constructed.^  This  represented  approximately  222  new  schools  (and 
666  new  teachers)  per  district  or  over  one  school  per  every  500  children  aged  5  to  14  in  1971. 
This  amounted  to  double  the  number  of  existing  school  buildings.  However,  since  the  INPRES 
schools  were  smaller  than  most  existing  schools  (three  teachers),  the  increase  in  the  number  of 
teachers  was  only  43  percent.^  Prior  to  1973,  in  contrast,  very  few  schools  were  constructed. 
There  was  a  complete  ban  on  the  recruitment  of  new  civil  servants.  Some  newly  trained  teachers 
could  not  find  employment  (Daroesman  (1971)). 

Once  an  INPRES  school  was  established,  the  central  government  recruited  the  teachers  and 
paid  their  salaries  (each  school  was  designed  for  three  teachers  and  120  pupils).  The  minimum 
requirement  to  be  a  primary  school  teacher  was  an  upper  secondary  school  degree,  generally 
obtained  in  a  special  training  school.  In  1971,  71  percent  of  the  primary  school  teachers  met  this 
qualification,  while  29  percent  were  underqualified.  An  eflFort  to  train  more  teachers  paralleled 
the  INPRES  program  (World  Bank  (1990b)). 

The  program  was  designed  explicitly  to  target  children  who  had  not  previously  been  en- 
rolled in  school,  and  a  separate  budget  was  designed  for  the  rehabihtation  of  existing  run-down 
buildings  (Heneveld  (1978)).  The  distribution  of  funds  is  described  in  detailed  governmental  in- 
structions (the  "presidential  instructions":  (Bappenas  1973,  1974,  1975,  1976,  1977,  1978)).  All 
schools  were  constructed  identically.  The  instructions  were  explicit  about  the  allocation  rule.  In 
1973-74  and  1974-75,  the  number  of  schools  to  be  constructed  in  each  district  was  proportional 
to  the  number  of  children  of  primary  school  age  not  enrolled  in  school  in  1972.  Prom  1975-76 
on,  the  rule  was  spelled  out  slightly  differently  but  had  similar  implications:  the  number  of 
schools  to  be  constructed  was  proportional  to  the  number  of  new  pupils  to  be  accommodated 
between  1972  and  1978  in  the  region  to  satisfy  the  target  enrollment  rate  of  85  percent  in  1978. 
More  schools  were  allocated  to  the  transmigration  regions.'^  The  final  allocation  was  decided 
by  planners  in  the  Ministry  of  Education  and  Culture,  with  the  approval  of  the  Department  of 
Finance  and  the  Bappenas,  the  administration  responsible  for  the  final  implementation  of  the 
program.  Funds  were  then  sent  through  the  governor's  office  to  the  local  administrations,  who 


supervised  the  actual  construction.   The  instructions  Hsted  the  exact  number  of  schools  to  be 
constructed  in  each  district  (kabupaten/kotamadya). 

I  use  this  planned  number  in  my  analysis,  rather  than  the  actual  number  of  schools  con- 
structed, which  is  not  available.  In  1983,  the  Ministry  of  Education  and  Culture  conducted  a 
survey  of  the  implementation  of  the  program  from  1973  to  1983.  According  to  this  study,  the 
actual  number  of  schools  constructed  matched  the  plans  until  1980.  Some  discrepancy  occurs 
thereafter.  The  Ministry  of  Education  and  Culture  has  also  published  data  on  the  number  of 
schools  operating  in  1973-74  and  1978-79.  This  data  suggests  that  the  actual  increase  in  the 
number  of  functioning  schools  was  lower  than  the  number  of  schools  constructed  between  1973- 
74  and  1977-78  under  the  INPRES  program.  One  reason  is  that  prior  to  1973,  several  schools 
were  frequently  operating  in  the  same  building  (as  soon  as  a  school  had  more  than  one  class  per 
grade,  it  became  two  schools,  with  separate  head-teachers  and  administrative  status).  School 
buildings  in  urban  areas  could  operate  in  as  many  as  four  shifts  a  day  (Daroesman  (1971)).  It 
is  quite  possible  that  some  new  buildings  were  used  to  reduce  overcrowding  in  the  old  ones. 
Consistent  with  this,  the  average  increase  in  the  number  of  teachers  implied  by  the  allocation  of 
INPRES  schools  is  very  close  to  the  increase  in  the  actual  number  of  teachers  recorded  by  the 
Ministry  of  Education. 

Using  this  data,  I  first  checked  whether  the  actual  allocation  decided  upon  by  the  Ministry 
of  Education  corresponds  to  the  stated  allocation  rule.  I  used  the  Ministry  data  on  the  number 
of  children  enrolled  in  school  in  the  school  year  1973-1974  to  construct  an  estimate  of  rate  of 
nonenroUment  in  1973-74  among  children  aged  5  to  14  years  old.^  Table  2  presents  the  results 
of  a  regression  of  the  logarithm  of  the  number  of  INPRES  schools  constructed  in  each  region  on 
the  logarithm  of  the  nonenroUment  rate  and  the  logarithm  of  the  number  of  children.  The  actual 
rule  would  predict  that  both  coefficients  should  be  close  to  one.  The  logarithm  of  the  number 
of  schools  built  in  each  region  between  1973-74  and  1978-79  is  positively  correlated  with  the 
logarithm  of  the  number  of  children  and  with  the  logarithm  of  one  minus  the  enrollment  rate.^ 
A  substantial  part  of  the  variation  between  regions  is  explained  by  these  two  factors  alone:  the 
R-squared  is  0.78.  The  coefficient  of  the  nonenroUment  rate  is,  however,  not  equal  to  one.  This 
might  be  explained  by  measurement  error  in  the  enrollment  measure,  and  by  the  fact  that  I 
do  not  use  the  actual  formula,  which  was  nonlinear  (regions  that  had  an  enrollment  rate  of  85 
percent  in  1972  were  not  supposed  to  get  any  schools).  Finally,  the  implementation  of  such  a 


massive  program  in  a  country  as  large  and  heterogenous  as  Indonesia  was  bound  to  involve  some 
deviation  from  the  general  rule. 

III.    Identification  Strategy 
A.  Sources  of  Variation 

The  date  of  birth  and  the  region  of  birth  jointly  determine  an  individual's  exposure  to  the 
program.  A  child  born  in  1962  or  before  was  12  or  older  in  1974,  when  the  first  INPRES  schools 
were  constructed.  Indonesian  children  normally  attend  primary  school  between  the  ages  of  7 
and  12.  Thus,  a  child  aged  12  or  older  in  1974  normally  did  not  benefit  from  the  program, 
since  he  should  have  left  primary  school  before  the  school  year  1974-75,  when  the  first  INPRES 
schools  (built  in  1973-74)  were  opened.  Grade  repetition  and  delayed  school  entry  could  lead 
a  few  of  these  children  to  benefit  from  the  program  for  their  last  year  in  school  (which  would 
lead  to  downward  bias  in  the  estimation  if  they  are  mistakenly  considered  as  nontreated)  .^° 
However,  the  exposure  of  children  aged  12  or  older  in  1974  is  very  limited.  According  to  the 
1993  Indonesian  Family  Life  Survey  (IFLS)  data  set  (conducted  in  1993  by  RAND  and  the 
Demographic  Institute  at  the  University  of  Indonesia),  less  than  3  percent  of  these  children  were 
still  in  primary  school  in  1974  (the  schools  were  opened  in  the  second  half  of  1974).  A  child 
born  in  1968  was  6  in  1974  and  11  in  1979.  He  was  exposed  to  the  first  wave  of  construction 
while  he  was  of  primary  school  age  but  only  partly  to  the  next  waves.  A  child  born  in  1972  was 
fully  exposed.  In  summary,  children  12  or  older  in  1974  were  not  exposed  to  the  program  and 
for  younger  children  the  exposure  is  decreasing  with  their  age  in  1974.  As  a  result,  the  effect  of 
the  program  should  be  0  for  children  12  or  older  in  1974  and  increasing  for  younger  children. 

The  intensity  of  the  program  varied  across  regions.  The  region  of  birth  is  highly  correlated 
with  the  region  of  education.  91.5  percent  of  the  children  in  the  IFLS  sample,  were  still  hving  in 
the  district  they  were  born  in  at  age  12.  Migration  introduces  measurement  error,  which  leads 
to  downward  bias  in  the  estimation  of  the  program  effect.  However,  endogenous  migration  could 
bias  estimates  of  program  effects  obtained  by  comparing  outcomes  according  to  the  individual's 
region  of  education  (Rosenzweig  and  Wolpin  (1988b)).  Some  families  might  have  moved  between 
the  birth  of  a  child  and  his  education  period  to  benefit  from  the  program.  Region  of  birth,  on  the 
other  hand,  is  not  endogenous  with  respect  to  the  program,  since  all  individuals  in  the  sample 


were  born  before  the  program  was  started.  Therefore,  the  parents  could  not  have  moved  to  the 
high  program  regions  before  the  birth  of  the  child  to  benefit  from  the  program.^^ 

Thus,  the  basic  idea  behind  the  identification  strategy  can  be  illustrated  using  simple  two- 
by-two  tables.  In  table  3, 1  present  results  that  illustrate  the  identification  strategy  and  a  test  of 
an  imphcation  of  the  identifying  assumption.  These  results  are  imprecise,  due  to  the  fact  that 
only  a  small  part  of  the  available  information  is  used.  They  are  provided  as  an  illustration  of 
the  identification  strategy.  This  table  shows  means  of  education  and  wages  for  different  cohorts 
and  program  levels. ^^  In  table  1,  I  indicate  the  program  intensity  in  both  types  of  regions.  In 
high  program  regions  an  average  of  2.44  schools  per  1,000  children  were  built;  in  low  program 
regions,  an  average  of  1.54  schools  per  1,000  children  were  built.  The  difference  was  0.90  school 
per  1,000  children.  In  table  3,  panel  A,  I  present  the  main  experiment.  I  compare  the  educational 
attainment  and  the  wages  of  individuals  who  had  little  or  no  exposure  to  the  program  (they 
were  12  to  17  in  1974)  to  those  of  individuals  who  were  exposed  all  the  time  they  were  in 
primary  school  (they  were  2  to  6  in  1974) ,  in  both  types  of  regions.  The  program  provision  that 
more  schools  would  be  built  in  lower  enrollment  regions  is  reflected  in  the  diflferences  between  the 
education  in  low  and  high  program  regions.  In  both  cohorts,  the  average  educational  attainment 
and  wages  in  regions  that  received  fewer  schools  are  higher  than  in  regions  that  received  more 
schools.  In  both  types  of  regions,  average  educational  attainment  increased  over  time.  However, 
it  increased  more  in  regions  that  received  more  schools.  The  diflFerence  in  these  differences  can  be 
interpreted  as  the  causal  effect  of  the  program,  under  the  assumption  that,  in  the  absence  of  the 
program,  the  increase  in  educational  attaimnent  would  not  have  been  systematically  different 
in  low  and  high  program  regions.  An  individual  young  enough,  born  in  a  high  program  region, 
received  on  average  0.12  more  years  of  education,  and  the  logarithm  of  his  wage  in  1995  was 
0.026  higher.  These  differences  in  differences  are  not  significantly  different  from  0.  This  simple 
estimator  suggests  that  one  school  per  1,000  children  contributed  to  increase  education  by  0.13 
years  (0.12  divided  by  0.90)  and  wages  by  0.029  for  children  aged  2  to  6  when  the  program  was 
initiated.  The  Wald  estimate  (a  simple  — but  imprecise —  instrumental  variables  estimator)  of 
returns  to  education  is  the  ratio  of  these  two  estimates. 

This  difference  in  differences  estimator  is  comparable  to  the  fixed-effect  procedure  proposed 
for  the  evaluation  of  social  programs  in  developing  countries.^^  As  Strauss  and  Thomas  (1995) 


point  out  in  their  assessment  of  the  approach  adopted  by  these  papers,  the  identification  as- 
sumption should  not  be  taken  for  granted:  the  pattern  of  increase  in  education  could  vary 
systematically  across  regions.  Moreover,  the  simple  diflFerences  (the  differences  in  education 
across  cohorts  and  between  regions)  are  large.  This  makes  the  difference  in  differences  sensitive 
to  assumptions  about  functional  form  (Bruce  D.Meyer  (1995),  James  J.  Heckman  (1996)).  In 
particular,  if  the  increase  in  education  was  negatively  correlated  with  initial  levels,  this  pattern 
would  be  observed  in  the  data  even  if  the  program  had  no  effect. 

Therefore,  an  interesting  aspect  of  this  experiment  is  that  an  implication  of  the  identifica- 
tion assumption  can  be  tested.  Individuals  aged  12  or  older  in  1974  were  not  exposed  to  the 
program.  Therefore,  in  this  age  group,  the  increase  in  education  between  cohorts  should  not 
differ  systematically  across  regions.  This  control  experiment  exploits  the  presence  of  multiple 
control  groups  formed  by  the  successive  cohorts  not  exposed  to  the  program  (see  Heckman  and 
V.  Joseph  Hotz  (1989)  and  Paul  R.  Rosenbaum  (1987)). 

In  table  3,  panel  B,  I  present  this  control  experiment.  I  consider  a  cohort  aged  18  to  24 
in  1974  and  a  cohort  aged  12  to  17  in  1974.  The  estimated  differences  in  differences  are  very 
close  to  0.  These  results  provide  some  suggestive  evidence  that  the  differences  in  differences 
are  not  driven  by  inappropriate  identification  assumptions  but  the  differences  in  differences  are 
imprecisely  estimated.  In  panel  B,  for  example,  the  differences  in  differences  are  insignificantly 
difi'erent  from  0  but  also  from  the  differences  in  differences  in  panel  A.  The  remainder  of  this 
paper  will  elaborate  this  strategy  to  lead  to  more  convincing  results. 

B.  Conceptual  Framework 

In  this  subsection,  I  use  a  simple  version  of  the  model  of  endogenous  schooling  developed  in 
Card  (1995),  who  draws  on  Gary  Becker  (1967).  I  extend  it  to  take  into  account  the  general 
equilibrium  implications  of  the  program,  since  such  a  large  program  could  have  aflFected  the 
returns  to  education.  ^^ 

I  write  an  individual's  utiUty  in  the  form  U{w,S)  =  lnw{S)  —  h{S),  where  h{S)  is  the  cost 
of  schooling  function  and  w{S)  is  the  income  of  an  individual  with  schooling  S.  Following  Card, 
the  marginal  cost  of  schooling  is  written  h'{S)  =  rijk  +  (pS.  I  assume  that  returns  to  schooling 
are  linear^  ^ 
(1)  Vijk  =  Inwijk  =  aijk  +  bijkS, 
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where  yijk  is  the  logarithm  of  the  wage  of  an  individual  i,  born  in  region  j,  in  cohort  k. 
Individuals  maximize  expected  utility.  Optimal  choice  of  schooling  implies 

{^)  Jijk  -  ,  , 

where  Ekbijk  denotes  the  expectation  of  future  returns  to  education  at  a  time  when  the  individual 
makes  his  schooling  decision.  Heterogeneity  is  modeled  additively:  bijk  =  bjk  +  Vi  and  rj^fc  = 
rjk  +  uji,  where  bjk  is  the  average  returns  to  education  for  cohort  k,  in  region  j,  and  i/j  is  the 
individual  deviation  from  the  regional  average. 

To  capture  the  most  important  modifications  induced  by  the  program,  the  cost  of  education 
is  modeled  as  a  linear  fimction  of  the  number  of  primary  schools  per  capita  (Zjk)  and  other 
regional  characteristics  not  afltected  by  the  program  (/i_/fc): 

(3)  rjk  =  aiZjk  +  fijk 

Returns  to  education  are  in  turn  affected  by  the  quaUty  of  education  and  the  supply  and  demand 
for  skills.  I  write  them  as  a  Unear  function  of  the  average  education  in  the  region  (Sj),  the  average 
education  in  the  country  (5),  the  quality  of  schooUng  in  the  region  at  the  time  people  received 
their  education  (qjk),  and  regional  economic  conditions  which  will  determine  the  demand  for 
skiUs  in  the  region  {vj):^^ 

(4)  bjk  =  2A  Sj  +  2/?25  +  piQjk  +  Vj . 

The  program  directly  affected  the  cost  of  education  and  indirectly  its  returns,  due  to  potential 
changes  in  school  quality  and  to  general  equiUbrium  effects  of  an  increase  in  education  on  the 
price  of  skills.  Assume  (reaUsticaUy)  that  the  generations  educated  before  the  program  did  not 
anticipate  it  when  choosing  their  education  level  (EoZjk  =  Zjq  and  Eoqjk  =  Qjo).  Then  the 
increase  in  average  education  caused  a  reduction  in  actual  returns  to  education  of  everybody  in 
the  labor  market  (equation  (4)),  but  it  only  affected  the  expected  returns  of  the  cohorts  exposed 
to  the  program.  Consider  the  average  education  of  an  old  cohort,  not  exposed  to  the  program, 
denoted  0,  and  a  younger  cohort  of  age  k  in  1974,  denoted  k.  Assuming  (to  simpUfy  notation), 
that  there  are  only  two  cohorts  and  that  they  have  the  same  size,^^  I  can  compute  the  expressions 
for  these  averages  implied  by  the  rational  expectation  equihbrium  of  this  model.^^ 
The  difference  between  these  two  averages  takes  the  following  form: 

(5)  Sjk  —  Sjo  =  TTo  +  ■n\{Zjk  —  Zjo)  +  ■K2{qjk  -  Qjo)  +  ^j, 
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where  ttq  is  a  constant  term  that  incorporates  the  changes  in  the  expected  average  education  in 
the  country,  tti  =  —  J^n  ,  '^2  =  ^_0  >  and  ^j  reflects  changes  in  all  other  factors  affecting  the 
returns  and  costs  of  education. 

The  increase  in  the  number  of  primary  schools  per  capita  available  between  any  cohort  and 
the  oldest  cohort  in  the  sample  is  a  function  of  the  total  number  of  INPRES  schools  per  capita 
constructed  in  the  region,  denoted  Pj,  and  the  exposure  of  this  cohort  to  the  program,  denoted 
ek;  which  is  equal  to  0  for  nonexposed  cohorts  {k  greater  than  13),  and  to  1  for  a  fully  exposed 
cohort (Zjfc  —  Zjo  =  ^kPj)-  The  quality  of  schooling  is  not  directly  observed.  I  can  capture 
the  possibility  that  the  program  affected  the  quality  of  education  by  writing  the  change  in  the 
quality  of  schooling  as:  Qjk  —  QjO  =  ^ki^Pj  +  Pj)-  I  can,  therefore,  write  the  previous  expression 
as 
(6)  Sjk  -  Sjo^  7^0 +  ek{ni  +  X)Pj+^'j 

The  strategy  implemented  in  this  paper  amounts  to  estimating  equation  (6)  using  weighted 
least  squares  for  various  cohorts.  The  structure  of  the  program  implies  that  coefficients  in 
equation  (6)  should  be  0  for  A;  >  13.  This  restriction  will  provide  a  test  of  the  identification 
assumption  in  this  paper,  which  is  that  there  were  no  other  region-specific  changes  in  the  returns 
and  costs  of  education  correlated  with  the  program. 

If  each  region  is  an  isolated  labor  market,  returns  to  education  in  a  given  region  do  not 
depend  on  the  average  education  in  other  regions  (/?2  =  0) ,  and  the  coefficient  of  Pj  in  equation 

(6)  is  the  reduced-form  effect  of  the  program  on  average  education  (taking  into  account  potential 
changes  in  quahty  induced  by  the  program).  If,  on  the  contrary,  returns  to  education  do  not 
depend  on  the  average  education  in  the  region  but  only  on  the  average  education  in  the  country 
as  a  whole  (/3i  =  0) ,  then  it  identifies  by  how  much  the  supply  of  educated  labor  shifted  as  a 
response  to  the  program  (a  behavioral  parameter  of  interest).  In  the  most  general  model,  I  am 
able  to  identify  only  a  mixture  of  the  two. 

I  can  now  use  equation  (1)  to  express  the  average  of  the  logarithm  of  the  1995  earnings  for 
a  given  region  and  cohort  as  follows: 

(7)  Vjk  =  CLjk  +  Jj—  22  ^ijkSijk  —  djk  +  bjkSjk  +  Cjfc, 

where  Njk  is  the  number  of  individuals  in  region  j,  in  cohort  k,  and  e^-fc  =  -^  2,=!  ^i^'^,^'- 
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Using  the  definition  of  6^^  (equation  (4))  and  replacing  Sjk  (equation  (6))  ,1  get  the  following 
expression  of  the  difference  between  the  logarithms  of  the  1995  wages  of  an  old  cohort  and  a 
younger  cohort: 
(8)  Vjh  -  Vjo  =  cijk  -  ajo  +  efc(&7ri  +  \(33)Pj  +  rjj, 

where  b  is  the  average  return  to  education  of  the  cohort  educated  in  the  old  schools  (i.e. ,  before 
any  change  in  quaUty  due  to  the  program).  The  term  rjj  represents  the  deviation  of  region- 
specific  returns  to  education  from  the  national  average.  It  could  be  related  to  program  intensity 
for  two  reasons.  First,  there  is  a  relationship  between  the  education  in  the  old  cohort  (which 
determines,  to  a  large  extent,  the  program  placement)  and  the  regional  returns  to  education  (this 
is  an  equilibrium  relationship  as  in  equations  (2)  and  (4),  therefore,  the  sign  of  the  correlation 
is  uncertain).  Second,  general  equilibrium  effects  may  cause  returns  to  education  to  decrease 
more  (and  therefore  be  lower  in  1995)  in  the  regions  that  received  more  schools. 

As  in  the  case  of  education,  the  identification  strategy  is  to  estimate  equation  (8)  for  various 
cohorts.  I  test  whether  the  coefficients  of  the  program  are  equal  to  0  for  /c  >  13.  This  is  a  test 
of  the  identification  assumptions  that  regional  changes  in  the  intercept  of  the  earning  function 
are  uncorrelated  with  the  program  and  that  region-specific  returns  do  not  differ  in  a  way  that 
is  related  to  the  program  intensity.  ^^ 

Estimating  equation  (8)  gives  us  the  average  impact  of  the  program  on  private  wages  due  to 
the  average  increase  in  education  and  potential  changes  in  school  quality.  Wages  are  observed  in 
1995,  after  returns  have  reached  their  new  equilibrium  level.  Therefore,  this  does  not  account  for 
the  potential  loss  due  to  the  reduction  in  returns  to  schooling  coming  from  general  equilibrium 
effects.  If  I  assume,  in  addition,  that  A  is  equal  to  0,  namely  that  the  program  did  not  affect 
the  quahty  of  education,  the  coefficient  of  Pj  in  equation  (8)  reduces  to  ekbiTi.  An  estimate  of  6, 
the  average  returns  to  education,  can  then  be  obtained  by  dividing  the  coefficients  of  Pj  in  the 
education  equation  and  in  the  wage  equation.  This  is  a  simple  instrumental  variables  estimator 
(indirect  least  squares).  Two-stage  least  squares  estimators  will  be  presented  in  section  V.  In 
addition,  if  there  is  a  subgroup  where  tti  is  known  to  be  equal  to  0,  the  coefficient  of  Pj  in 
equation  (8)  in  this  subgroup  gives  the  pure  quality  effect  of  the  program  on  earnings.  This  fact 
will  be  used  to  test  the  assumption  that  A  is  indeed  0. 

To  conclude,  note  that  the  above  argument  is  vaUd  under  the  assumptions  of  linear  individual 
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heterogeneity  and  returns  to  education.  Under  these  assumptions,  changes  in  education  caused 
by  the  program  are  not  correlated  with  individual-specific  returns.  If  (p  (the  second  derivative 
of  the  cost  of  schooling  function)  differed  across  individuals,  or  if  returns  to  education  were  not 
linear,  individual  responses  to  the  program  might  depend  on  returns  to  education.  An  instru- 
mental variable  estimator  does  not  provide  a  consistent  estimator  of  average  returns  to  education 
in  this  case  (see  Heckman  and  Robb  (1985)  and  Card  (1995,  1999)).  As  shown  by  Angrist  and 
Guido  Imbens  (1994,  1995)  in  the  case  of  a  binary  instrument,  if  the  instrument  satisfies  the  two 
assumptions  of  independence  and  monotonicity,  the  instrumental  variable  estimate  is  a  weighted 
average  of  returns  to  education  for  the  individuals  who,  as  a  result  of  the  program,  changed  their 
level  of  education.  It  is  a  causal  parameter  of  interest  for  policy  evaluation,  since  it  measures 
the  returns  for  people  affected  by  this  policy. 

III.    Effect  on  Education 
A.  Basic  Results 

I  start  by  estimating  equation  (6)  for  two  large  cohorts.  In  practice,  I  estimate 

(9)  Sijk  =  ci  +  aij+f3ik  +  iPj*Ti)-fi  +  {Ci*Ti)6i  +  eijk, 

where  Tj  is  a  "treatment  dummy"  indicating  whether  the  individual  belongs  to  the  "young"  (or 
treated)  cohort  in  the  subsample,  ci  is  a  constant,  /3fc  is  a  cohort  of  birth  fixed  effect,  aj  is  a 
district  of  birth  fixed  effect,  Pj  denotes  the  intensity  of  the  program  in  the  region  of  birth,  and 
Cj  is  a  vector  of  region  specific  variables.^° 

In  table  4  (columns  1  -  3) ,  I  present  estimates  of  equation  (9)  for  two  subsamples.  In  panel 
A,  I  compare  children  aged  2  to  6  in  1974  with  children  aged  12  to  17  in  1974.  In  column  1,  I 
control  only  for  the  interaction  of  a  cohort  of  birth  dimimy  and  the  population  aged  5  to  14  in 
1971.^^  The  suggested  effect  is  that  one  school  built  per  1,000  children  increased  the  education 
of  the  children  aged  2  to  6  in  1974  by  0.12  years  for  the  whole  sample,  and  0.20  years  for  the 
sample  of  wage  earners. 

This  result  relies  on  the  identification  assumption  that  there  is  no  omitted  time-varying  and 
region  specific  effects  correlated  with  the  program.  The  allocation  of  schools  to  each  region  was  a 
function  of  the  enrollment  rate  in  the  region  in  1972.  Therefore,  this  assumption  will  be  violated 
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if  changes  in  expected  and  actual  determinants  of  returns  and  costs  of  education  are  correlated 
with  initial  levels.'^'^  Moreover,  it  will  be  violated  if  the  allocation  of  other  governmental  programs 
initiated  as  a  result  of  the  oil  boom  (and  potentially  affecting  education)  was  correlated  with  the 
allocation  of  INPRES  schools.  The  identification  assumption  might,  therefore,  be  satisfied  only 
after  controlling  for  these  factors.  Thus,  I  present  specifications  that  control  for  the  interactions 
between  cohort  dummies  and  the  enrollment  rate  in  the  population  in  1971,^^  as  well  as  for 
interactions  between  cohort  dummies  and  the  allocation  of  the  water  and  sanitation  program,  the 
second  largest  INPRES  program  centrally  administered  at  the  time.  Controlling  for  enrollment 
rate  and  the  water  and  sanitation  program  makes  the  estimates  higher  (columns  2  and  3), 
suggesting  that  the  estimates  are  not  biased  upward  by  mean  reversion  or  omitted  programs. 

In  table  4,  panel  B,  I  show  the  results  of  the  control  experiment  (comparing  the  cohort  aged 
12  to  17  to  the  cohort  aged  18  to  24  in  1974).  The  impact  of  the  "program"  is  very  small  and 
never  significant.  The  coefficients  are  statistically  diflFerent  from  the  corresponding  coefficients 
in  panel  A. 

Figure  1  plots  the  difference  in  education  between  the  yoimg  and  the  old  cohorts  against  the 
program  intensity  in  each  region.  The  regression  fine  corresponds  to  the  weighted  least  squares 
estimate  of  equation  (6)  (the  coefficients  are  presented  in  table  4,  column  1).  I  also  plotted 
the  kernel  estimator,  which  shows  that  the  effect  of  the  program  on  education  is  approximately 
Unear.  This  justifies  the  assumption  of  linearity  used  to  derive  equation  (6).  In  the  control 
experiment  (panel  B),  the  regression  line  as  well  as  the  nonparametric  regression  are  flat. 

B.  Reduced- Form  Evidence 

The  identification  strategy  discussed  in  the  previous  section  can  be  generalized  to  an  interaction 
terms  analysis. 

Consider  the  following  relationship  between  the  education  (Sijk)  of  an  individual  i,  born  in 
region  j,  in  year  k,  and  his  exposure  to  the  program: 

23  23 

(10)  Sijk  =  ci+  aij  +  pik  +  Y,(.^J  *  ^iihii  +  Y.(^i  *  ^i')^ii  +  ^»Jfc' 

1=2  1=2 

where  dn  is  a  dummy  that  indicates  whether  individual  i  is  of  age  I  in  1974  (a  year-of-birth 
dummy).    In  these  unrestricted  estimates,  I  measure  the  time  dimension  of  exposure  to  the 


15 


program  with  22  age  dummies  (for  being  2  to  23  in  1974).  The  omitted  dummy  is  the  dummy 
for  being  24  in  1974  (individuals  aged  24  in  1974  form  the  control  group) .  Each  coefficient  71; 
can  be  interpreted  as  an  estimate  of  the  impact  of  the  program  on  a  given  cohort.  This  is  simply 
a  generalization  of  equation  (6)  to  estimate  cohort-by-cohort  contrasts.  Because  children  aged 
13  and  older  in  1974  did  not  benefit  from  the  program,  the  coefficients  71;  should  be  0  for  /  >  12 
and  start  increasing  for  /  <  some  threshold  (the  oldest  age  at  which  an  individual  could  have 
been  exposed  to  the  program  and  still  benefited  from  it).  The  only  a  priori  restriction  about 
this  threshold  is  that  it  is  smaller  than  12. 

Table  Al  presents  unrestricted  reduced-form  estimates  of  three  specifications.  These  reduced- 
form  estimates  allow  me  to  check  whether  the  71/  in  equation  (10)  follows  the  expected  pattern. 
In  figure  2,  I  have  plotted  the  'yu-'^'^  Each  dot  on  the  solid  line  is  the  coefficient  of  the  interac- 
tion between  a  dummy  for  being  a  given  age  in  1974  and  the  number  of  schools  constructed  per 
1,000  children  in  the  region  of  birth  (a  95  percent  confidence  interval  is  plotted  in  broken  lines). 
Each  dot  summarizes  the  effect  of  the  between-regions  variation  in  program  intensity  on  a  given 
cohort.  For  example,  a  child  aged  6  in  1974  received  0.2  additional  years  of  education  if  he  was 
born  in  a  region  that  received  one  more  INPRES  schools  per  1,000  children.  These  coefficients 
fiuctuate  around  0  until  age  12  and  start  increasing  after  age  12.  As  expected,  the  program  had 
no  effect  on  the  education  of  cohorts  not  exposed  to  it,  and  it  had  a  positive  effect  on  the  edu- 
cation of  younger  cohorts.  All  coefficients  are  significantly  diff^erent  from  0  after  age  8.  To  show 
more  clearly  the  discontinuity  in  the  trend,  I  plotted  in  figure  3b  a  smoother  version  of  the  same 
data:  for  each  /,  I  plotted  the  average  of  71/,  7/_i,  and  7i+i.  The  coefficients  are  close  to  0  until 
age  11,  and  then  they  increase  sharply.  This  pattern  is  similar  across  specifications.  Omitted 
changes  in  regional  conditions  should  have  affected  the  education  of  some  children  older  than 
12  in  1974  (for  example,  an  increase  in  regional  income  would  have  affected  junior  high  school 
education  at  least  as  much  as  primary  education)  and,  therefore,  would  not  generate  the  same 
pattern.  From  these  graphs,  it  therefore  appears  that  the  identification  strategy  is  reasonable 
and  that  the  program  had  an  effect  on  education. 
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C.  Restricted  Estimation 

Instead  of  testing  whether  the  71/  are  equal  to  0  for  Z  >  13,  one  can  impose  this  restriction.  The 
equation  to  be  estimated  is  then 

12  12 

(11)  Sijk  =  ci  +  aij  +  /3ijt  +  "^{Pj  *  c?i;)7u  +  X](^J  *  ^iO^ii  +  ^ijk- 

1=2  1=2 

The  omitted  group  (the  control  group)  is  now  formed  of  individuals  aged  13  to  24  in  1974.  This 
is  a  more  efficient  way  to  estimate  the  program  effect  and  leads  to  more  precise  estimates. 

Columns  1-3  in  table  5  show  the  coefficients  of  the  interactions  between  age  in  1974  and 
the  intensity  of  the  program  in  the  region  of  birth  in  three  specifications  in  the  whole  sample 
(columns  4-6  show  the  same  results  for  the  sample  of  wage  earners).  In  all  columns,  the  estimated 
effect  is  positive  after  age  10.  All  coefficients  are  significantly  greater  than  0  after  age  8.  All 
sets  of  interactions  are  statistically  different  from  0  (the  F-statistic  for  the  null  hypothesis  is 
presented  at  the  bottom  of  the  table).  The  coefficients  generally  increase  with  date  of  birth 
(decreasing  with  age) ,  except  for  a  high  value  at  age  9  and  a  decUne  between  ages  6  and  5.  They 
increase  faster  between  ages  12  and  9  than  they  do  subsequently;  this  fact  suggests  that  once 
the  education  level  in  the  population  reaches  a  certain  point,  increasing  it  by  building  primary 
schools  becomes  more  difficult. 

The  estimates  in  column  1  (without  controls)  suggest  that  one  school  per  1,000  children 
increases  the  education  of  the  youngest  children  by  0.14  years.  On  average,  1.98  schools  were 
built  per  1,000  children.  This  implies  that  at  its  mean  value,  the  program  caused  an  increase  in 
education  of  0.27  years  for  these  children  (the  average  education  in  the  sample  is  7.98  years).  As 
before,  controlling  for  enrollment  rate  in  1971  (column  2)  and  the  water  and  sanitation  program 
(column  3)  make  the  estimate  sUghtly  higher.  In  columns  4-6,  I  present  the  same  estimates  for 
the  subsample  of  wage  earners.  The  program  effect  is  higher  for  wage  earners  than  it  is  in  the 
whole  sample. 

More  insight  into  why  and  how  this  program  was  effective  is  obtained  by  examining  its 
impact  in  different  types  of  regions.  In  table  6  (panel  A),  I  present  results  equivalent  to  the 
specification  in  table  4  (equation  (9))  for  various  subsamples  of  regions  of  birth.  In  columns 
2  and  3,  I  present  the  program  effect  in  sparsely  and  densely  populated  regions.  In  sparsely 
populated  regions,  each  school  constructed  is  hkely  to  reduce  the  distance  to  school  significantly 
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(if  the  schools  are  placed  relatively  evenly  in  space).  In  densely  populated  regions,  the  main 
effect  will  not  be  to  reduce  the  distance  to  school,  but  to  increase  the  availability  of  slots  or 
reduce  the  overcrowding  of  old  schools.  Therefore,  the  difference  between  the  program  effects 
in  these  two  types  of  regions  will  provide  some  information  on  whether  reducing  distance  to 
school  or  overcrowding  of  schools  was  the  reason  why  the  program  was  effective.  The  estimated 
program  effect  is  0  in  densely  populated  regions,  while  it  is  0.19  in  sparsely  populated  regions 
(columns  2  and  3).  This  suggests  that  reducing  the  distance  to  school  was  the  most  important 
effect  of  the  program.  This  interpretation  should,  however,  be  taken  with  caution,  since  this 
difference  may  come  from  other  characteristics  correlated  with  density.  In  columns  4  and  5,  I 
present  the  results  in  provinces  where  the  incidence  of  poverty  in  1976  was  higher  and  lower  than 
the  Indonesian  average.  I  find  a  larger  effect  in  poor  provinces.  In  columns  6  and  7,  I  divide  the 
sample  into  regions  where  the  education  of  the  cohort  not  exposed  to  the  program  (men  born 
between  1950  and  1962)  was  lower  or  higher  than  the  median  (3.08  years  of  education) .  Results 
are  similar  for  both  sets  of  regions. 

In  summary,  it  appears  that  the  school  construction  program  had  an  impact  on  education. 
The  causal  interpretation  of  these  estimates  is  supported  by  preprogram  tests.  It  should  be 
recalled  that  this  program  was  accompanied  by  a  general  effort  by  the  Indonesian  government 
in  favor  of  education,  a  priority  of  the  second  five-year  plan.  As  part  of  this  effort,  primary 
school  fees  were  suppressed  in  1978  ((World  Bank  1990a,  World  Bank  1990b)).  Therefore,  these 
results  cannot  be  generalized  to  less  favorable  contexts  without  applying  caution. 

D.  At  What  Level  of  Education  Was  the  Program  Effective 

The  impact  of  the  program  on  welfare  depends  on  whether  it  primarily  affected  children  with 
a  low  or  a  high  level  of  education.  For  this  reason,  it  is  important  to  examine  at  what  level  of 
education  the  program  was  effective.  The  simplest  way  to  investigate  this  question  is  to  use  a 
difference  in  differences  estimator.'^^ 

I  group  the  regions  into  high  and  low  program  regions  and  consider  a  cohort  aged  2  to  6 
in  1974  and  a  cohort  aged  12  to  17  in  1974.  Instead  of  considering  only  diflFerences  in  group 
means,  I  consider  differences  in  the  cumulative  distribution  functions  (CDF)  of  education  (the 
probabilities  of  completing  any  given  level  of  education  or  less).  Figures  4  and  5  show  that  in 
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both  regions,  the  CDF  of  education  in  the  younger  cohort  stochastically  dominates  the  CDF 
of  education  in  the  older  cohort.  Moreover,  the  CDF  of  education  in  the  low-program  regions 
stochastically  dominates  the  CDF  of  education  in  the  high-program  regions  for  both  cohorts, 
a  consequence  of  the  higher  intensity  of  the  program  in  regions  with  lower  initial  enrollment. 
Figure  6  plots  the  differences  in  CDF  between  the  two  cohorts  for  both  type  of  regions.  The 
between-cohort  differences  in  CDF  is  larger  in  the  low-program  region  for  the  first  five  years  of 
education,  but  lower  after  the  sixth  year  (the  last  year  of  primary  school).  Figure  7  shows  the 
difference  in  differences  in  CDF  (the  95  percent  confidence  interval  is  plotted  in  broken  lines). 
The  dot  for  the  fifth  year  of  education,  for  example,  indicates  that  the  program  induced  six 
percent  of  the  sample  to  complete  six  years  of  education  or  more  (i.e. ,  graduate  from  primary 
school)  instead  of  five  or  less.  This  shows  a  positive  effect  of  the  program  at  all  primary  school 
levels.  It  had  no  effect  at  the  jvmior  high  school  level.  A  significantly  negative  effect  of  the 
interaction  is  shown  for  levels  of  education  of  nine  years  and  above. 

I  obtain  more  precise  estimates  by  including  a  full  set  of  region-of-birth  and  year-of-birth 
dummies.  To  estimate  an  equivalent  of  this  difference  in  differences  controlhng  for  these  vari- 
ables, I  estimate  a  linear  probability  model  for  the  probability  of  completingm  years  of  education 
or  less.  For  Sijkm,  ^  dummy  that  indicates  whether  the  individual  i,  born  in  region  j,  in  year  k, 
completed  m  years  of  education  or  less,  and  for  Pj ,  a  dummy  indicating  whether  the  child  was 
born  in  a  high  program  region,  I  estimate  the  following  equation: 

(12)  Sijkm  =  c  +  aj+f3k  +  (Pj  *  Ti)Km  +  eijk 

The  Km,  for  m  —  0  to  19,  are  the  values  of  the  estimated  impact  of  the  program  at  each  level  of 
education.  They  are  plotted  in  figure  8  (the  95  percent  confidence  interval  is  plotted  in  broken 
lines) . 

The  shape  of  this  fimction  and  the  shape  of  the  function  estimated  from  the  difference  in 
differences  in  the  CDF  are  similar.  Both  are  rising  until  the  fifth  year  of  education,  decreasing 
until  the  twelfth,  and  slightly  increasing  thereafter.  A  maximum  of  about  6  percent  of  the 
sample  living  in  high  program  regions  were  induced  to  complete  at  least  primary  school.  This 
also  shows  some  impact  of  the  program  on  the  probabiUty  of  completing  lower  secondary  school 
(1.5  percent  of  the  sample  is  estimated  to  have  been  induced  by  the  program  to  complete  7th, 
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8th,  and  9th  grade  or  more).^^  There  is  a  negative  difference  in  differences  at  the  senior  high 
school  level. 

The  analysis  of  the  CDF  suggests  that  the  program  increased  schooUng,  but  almost  entirely 
through  increasing  primary  schooling.  This  provides  additional  evidence  that  the  assumption 
underlying  the  identification  strategy  is  reasonable.  The  estimated  effect  of  the  program  for  the 
levels  of  education  that  it  did  not  target  is  small  or  nil.^^ 

The  program  could  have  induced  more  marginal  people  to  complete  primary  school  and 
move  on  to  junior  high  school.^^  However,  the  direct  and  indirect  costs  of  junior  high  school 
are  much  higher  than  the  costs  of  primary  education  and  were  not  equalized  across  regions  at 
the  time.  Individuals  induced  by  the  program  to  complete  primary  school  are  presumably  those 
who  were  facing  a  high  cost  of  education  before  the  program.  Therefore,  the  marginal  cost  of  a 
year  in  junior  high  school  must  have  remained  high,  since  this  was  not  affected  by  governmental 
intervention.  This  explains  why  we  do  not  observe  large  spillovers.  The  test  of  human  capital 
versus  sorting  models  of  returns  to  education  proposed  by  Kevin  Lang  and  David  Kropp  (1986) 
provides  another  interesting  aspect  of  this  result.  They  show  that  the  sorting  model  implies 
that  compulsory  attendance  laws,  which  affect  the  education  of  the  low-skill  workers,  should 
also  affect  the  education  of  the  high-skill  workers  (who  have  to  get  more  education  to  show  that 
they  are  different).  Under  the  human  capital  model,  compulsory  attendance  laws  should  not 
affect  the  education  of  people  who  are  not  directly  constrained  by  them.  The  INPRES  program 
directly  affected  primary  education  only,  but  under  the  sorting  model,  it  could  have  led  some 
people  who  would  have  completed  only  primary  school  to  complete  more  years  of  education. 
The  results  in  this  paper  suggest  that  the  human  capital  model  of  education  might  describe 
Indonesia  better  than  the  sorting  model. 

The  program  was  effective  in  increasing  education,  in  particular  at  the  primary  school  level. 
Did  it  increase  human  capital?  One  way  to  answer  that  question  is  to  look  at  the  effect  of  the 
program  on  wages. 
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IV.    EflFect  on  Wages 

A.  Basic  Results 

I  start  by  estimating  a  specification  equivalent  to  equation  (8)  for  the  experiment  of  interest  and 
the  control  experiment.  As  for  education  (equation  (9)),  I  estimate: 

(13)  Vijk  =  ci  +  aij  +  /?ifc  +  {Pj  *  Ti)7i  +  (Cj  *  Ti)6i  +  e^-fc, 

where  yijk  is  the  logarithm  of  the  1995  wage  of  an  individual  i,  born  in  region  j,  in  cohort  k. 

Results  are  presented  in  table  4  (columns  4-6)  and  in  figure  1.  In  table  4,  panel  A,  I  set  Ti 
equal  to  1  for  children  aged  2  to  6  in  1974,  and  I  use  children  aged  12  to  17  as  the  comparison 
group.  In  figure  1,  panel  A,  I  plotted  the  increase  in  wages  between  the  same  two  cohorts  against 
the  program  intensity,  as  well  as  the  weighted  least  squares  regression  line  (the  coefficients  of 
which  are  given  in  column  4  in  table  4)  and  the  kernel  regression  curve.  In  table  4,  panel  B, 
I  set  Ti  equal  to  1  for  children  aged  12  to  17  in  1967  and  used  children  aged  18  to  24  as  the 
comparison  group.  Corresponding  evidence  is  plotted  in  figure  1,  panel  B. 

In  table  4,  panel  A,  the  estimates  range  from  1.5  percent  to  2.7  percent.  As  in  the  case 
of  education,  the  estimates  increase  when  I  control  for  enrollment  rates  in  1971  and  for  the 
allocation  of  the  water  and  sanitation  program,  although  none  of  these  estimates  are  significantly 
different  from  each  other.  In  panel  B,  in  all  specifications,  the  interaction  coefficient  is  small  and 
not  significantly  different  from  0.  However,  these  estimates  are  imprecise  and  I  cannot  reject 
equality  of  the  coefficients  in  panels  A  and  B  (although  the  point  estimates  are  much  smaller  in 
panel  B) .  QuaUtatively,  the  results  of  estimating  this  reduced- form  expression  nevertheless  lead 
to  similar  conclusions  as  for  education.  The  program  seems  to  have  affected  average  wages,  the 
estimates  are  not  smaller  when  control  variables  are  introduced,  and  the  point  estimates  of  the 
program  effect  in  an  untreated  sample  is  smaller  and  close  to  0. 

B.  Reduced- Form  Evidence 

As  for  education,  I  can  now  write  an  unrestricted  reduced-form  relationship  between  exposure 
to  the  program  and  the  logarithm  of  the  wage  of  an  individual  (jjijk) 
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23  23 

(14)  Vijk  =  C2  +  a2j  +  /?2fc  +  "^{Pj  *  dii)'y2i  +  ^{C^  *  dii)62\  +  Vijk, 

1=2  1=2 

where  a2j  is  a  region-of-birth  effect  and  ^2fc  is  a  cohort-of-birth  effect.  Pj,  Cj  and  du  are  defined 
as  in  the  education  equation:  Pj  is  the  intensity  of  the  program  in  the  region  of  birth,  Cj  is  the 
vector  of  control  variables,  and  du  is  a  dummy  indicating  whether  individual  i  was  of  age  I  in 
1974. 

The  72i  should  be  0  for  I  greater  than  12  and  start  increasing  after  some  threshold.  Moreover, 
if  the  program  affected  wages  only  through  its  effect  on  education,  the  coefficients  72;  should 
track  the  71;  (in  the  education  equation).  In  particular,  the  threshold  after  which  the  coefficients 
72Z  start  to  increase  should  be  the  same  as  the  threshold  after  which  the  coefficients  71^  start  to 
increase.  The  72/  should  also  track  the  up  and  downs  of  the  71/. 

Table  Al  presents  the  results  for  the  three  specifications  for  which  I  had  estimated  the 
education  equation.  Again,  graphs  help  interpret  the  reduced  form  coefficients.  In  figure  3a, 
the  72i  are  plotted  in  a  dotted  line.  They  are  oscillating  until  age  10  and  start  increasing  after 
age  11.  The  coefficients  of  the  interactions  for  education  and  wages  track  each  other.  Figure  3b 
presents  the  same  data,  but  shows  more  clearly  the  impact  of  the  program.  The  values  on  this 
graph  are  smoothed  and  the  scales  are  adjusted.  In  this  figure,  the  change  in  trend  after  age  11 
is  very  apparent.  The  program  began  having  a  positive  effect  on  wages  and  education  at  that 
point. 

C.  Restricted  Estimates 

In  columns  7-9  in  table  5,  I  present  estimates  of  the  equation 

12  12 

(15)  Vijk  =  C2  +  a2j  +  02k  +  '^{Pj  *  dii)'y2i  +  ^{Cj  *  rfji)^2l  +  Vijk- 

1=2  1=2 

It  is  more  difficult  to  precisely  estimate  the  eflFect  of  the  program  on  wages  than  on  edu- 
cation, because  wages  fluctuate  more  and  the  sample  is  smaller  (since  wages  are  not  collected 
for  self-employed  people).  Not  surprisingly  then,  I  find  that  few  coefficients  are  individually 
significant  and  that  the  F-statistics  for  the  significance  of  the  joint  set  of  instruments  are  small. 
Qualitatively,  the  results  parallel  the  estimated  effects  on  education.  No  effect  is  found  for  chil- 
dren aged  10  or  older  in  1974.  The  coefficients  are  positive  for  younger  children  (except  at  age 
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7).  The  coefficients  of  the  interactions  are  generally  decreasing  with  age.  The  estimates  are 
higher  when  I  control  for  enrollment  rate  and  for  the  water  and  sanitation  program.  The  last 
line  in  this  table  indicates  that  constructing  one  school  per  1,000  children  increased  the  1995 
wages  of  individuals  aged  2  in  1974  by  1.6  percent  to  4.0  percent.  The  average  number  of  schools 
constructed  per  1,000  children  is  1.89  in  the  sample  with  valid  wage  data.  Thus,  on  average,  the 
program  caused  a  3-7  percent  increase  in  the  wages  of  this  cohort. 

In  table  6  (panel  B),  I  present  the  estimates  of  equation  (13)  for  different  subsamples.  The 
variations  of  the  program  effect  across  subsamples  parallel  the  variations  of  the  program  effect 
on  education.  In  particular,  I  see  no  effect  on  wages  in  regions  where  there  is  no  effect  on 
years  of  education.  This  suggests  that  the  program  effect  on  wages  was  probably  caused  by  the 
changes  in  years  of  education.  In  the  next  section,  I  use  this  to  construct  instrumental  variables 
estimates  of  the  effect  of  education  on  wages. 

V.    Estimating  Returns  to  Education 

The  identification  assumption  that  the  evolution  of  wages  and  education  across  cohorts  would 
not  have  varied  systematically  from  one  region  to  another,  in  the  absence  of  the  program,  is 
sufficient  to  estimate  the  impact  of  the  program.  Since  the  intervention  was  to  build  primary 
schools,  the  program  effect  on  wages  was  most  probably  caused  by  changes  in  education.  The 
additional  assumption  needed  is  that  the  program  had  no  effect  on  wages  other  than  by  increasing 
education.  I  can,  therefore,  use  this  program  to  construct  instrumental  variables  estimates 
of  the  impact  of  additional  years  of  education  on  wages.  The  most  serious  concern,  for  this 
interpretation,  is  that  the  program  might  have  affected  both  the  quality  and  the  quantity  of 
education  and  that  changes  in  wages  could  reflect  both  effects.  Below,  I  examine  whether  there 
is  evidence  that  this  is  a  serious  problem. 

A.  Indirect  Least  Squares  Estimates 

Following  the  discussion  in  section  III,  I  calculate  indirect  least  squares  estimates  of  returns  to 
education  by  dividing  the  estimate  of  the  program  effect  on  wages  by  the  program  effect  on  ed- 
ucation. For  example,  dividing  the  estimate  of  table  4,  panel  A,  column  4,  by  the  corresponding 
estimate  in  column  1  for  wage  earners,  I  obtain  an  estimate  of  average  returns  to  education  of 
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7.5  percent.  Adding  controls,  I  find  8.6  percent  and  10.4  percent.  This  strategy  can  be  extended 
to  use  all  the  information  available,  by  using  a  two-stage  least  squares  (2SLS)  strategy  instead 
of  the  indirect  least  squares  (ILS)  approach. 

B.  Two-stage  Least  Squares  Estimates  of  the  Returns  to  Education 

Estimates  of  equation  (11)  are  of  intrinsic  interest  because  they  provide  an  assessment  of  the 
impact  of  the  program  on  education.  But  they  also  represent  the  first  stage  of  a  two-stage  least 
squares  estimation  of  the  impact  of  education  on  wages.  Equation  (1)  characterizes  the  causal 
eff^ect  of  education  on  wages: 

Vijk  ^  ^ijfc  "1    OijkJijk- 

Rewrite  this  expression  as 

(16)  yijk  =  d  +  aj+/3k  +  Sijkb  +  T]ijk, 

where  aj  and  Pk  denote  region-of-birth  and  cohort-of-birth  effects.  The  region-specific  error 
terms,  the  cohort-specific  error  terms,  and  the  individual  error  terms  incorporate  individual  and 
regional  differences  in  returns  to  education  and  in  the  specific  intercepts. 

Under  the  assumptions  discussed  in  section  III,  the  interactions  between  the  age  in  1974  and 
the  program  intensity  in  the  region  of  birth  are  available  as  instruments  for  equation  (16).  By 
limiting  the  set  of  instruments  to  the  interactions  in  equation  (11)  (the  age  dummies  for/  <  12), 
I  avoid  potential  small  sample  bias  caused  by  the  use  of  many  weakly  correlated  instruments. 
Moreover,  the  instruments  have  been  shown  to  have  a  good  explanatory  power  in  the  first 
stage,  which  indicates  that  the  2SLS  estimates  should  not  be  affected  by  this  problem.'^^  I  also 
estimated  the  same  equation  using  a  single  instrument,  the  interaction  of  being  in  the  "young" 
cohort  and  the  program  intensity  in  the  region  of  birth.  Equation  (16)  can  also  be  modified  to 
incorporate  control  variables  as  follows: 

12 

(17)  Vijk  ^d  +  aj  +  bk  +  Sijkb  +  ^{C^  *  da)  tti  +  ijijk- 

1=2 

The  results  are  presented  in  table  7,  panel  Al  (panel  A2  presents  results  with  the  logarithm 
of  monthly  earnings  as  the  dependent  variable).   The  first  Une  shows  the  OLS  estimate.   The 
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estimated  returns  to  education  is  7.8  percent  and  is  not  affected  by  introducing  control  variables. 
This  is  lower  than  OLS  estimates  in  Indonesia  in  older  samples,  but  consistent  with  estimates  in 
other  Indonesian  data  sets  of  the  1990's  and  with  the  decline  in  estimated  returns  to  education 
over  time.  The  World  Bank  (1990b)  reports  that  estimates  decreased  from  19  percent  in  1982 
to  10  percent  in  1986. 

The  second  Une  presents  2SLS  estimates  of  equation  (17)  (shown  in  square  brackets  are 
the  F-statistics  of  the  overindentifying  restrictions  test).  In  columns  1-3,  I  present  the  2SLS 
estimates  for  the  three  specifications  used  throughout  the  paper.  In  column  1,  the  number  of 
children  in  1971  is  the  only  control  variable.  The  point  estimate  (6.75  percent)  is  slightly  lower 
than  the  OLS  estimate,  although  I  cannot  reject  equality.  In  column  2,  I  introduce  interactions 
between  the  enrollment  rate  in  1971  and  year-of-birth  dummies.  The  point  estimate  is  higher 
than  without  the  controls  (8.1  percent).  When  I  introduce  a  control  for  the  water  and  sanitation 
program,  the  estimate  is  again  slightly  higher  (10.6  percent).  In  the  third  line,  I  present  the 
2SLS  estimate  using  only  one  instrmnent.  The  results  are  very  similar  to  the  IV  estimates  using 
more  instruments  (but  slightly  less  precise,  since  they  use  less  variation). 

These  2SLS  estimates  are  not  very  different  from  the  OLS  estimate.  The  general  belief  in 
the  development  literature  (Behrman  (1990),  Strauss  and  Thomas  (1995))  is  that  OLS  estimates 
are  Ukely  to  be  biased  upward  due  to  omitted  family  and  community  background  variables,  but 
this  does  not  seem  confirmed  here.^'' 

On  the  other  hand,  most  studies  in  industriaHzed  countries  find  instrumental  variables 
(IV)  estimates  that  are  higher  than  OLS  estimates,  which  is  not  what  I  find  here  (see  Card 
(1995,1999),  Orley  Ashenfelter,  Colm  P.  Harmon  and  Hessel  Oosterbeek  (1998)).  Card  (1999) 
discusses  several  possible  explanations  why  IV  estimates  tend  to  be  higher  than  OLS  estimates. 
I  consider  them  briefly  to  determine  whether  they  apply  in  this  context.  The  first  explanation, 
following  Zvi  GriUches  (1977),  is  that  ability  bias  in  the  OLS  estimates  of  returns  to  schooUng 
is  relatively  small  and  that  OLS  estimates  may  in  fact  be  biased  downward  due  to  measurement 
errors.  My  results  are  consistent  with  the  idea  that  the  ability  bias  and  the  measurement  error 
bias  more  or  less  cancel  each  other  out  in  the  Indonesian  context.  A  second  explanation  is  that 
the  IV  estimates  are  biased  even  further  upward  than  the  OLS  estimates,  due  to  unobserved 
differences  in  earning  ability  between  the  "treatment"  and  "control"  groups.  These  differences 
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in  earnings  are  then  "blown  up"  when  they  are  divided  by  small  differences  in  schooling.  Card 
notes  that  this  problem  is  likely  to  be  less  important  when  the  instruments  exploit  large  varia- 
tions in  education  and  an  interaction  between  two  sources  of  variations,  which  is  the  case  of  my 
IV  estimates.  A  third  explanation,  proposed  by  Ashenfelter,  Harmon  and  Oosterbeek  (1998)  is 
that  researchers  prefer  to  report  estimates  with  high  t-statistics.  With  a  relatively  large  sample 
and  important  variations  in  schooUng,  the  standard  errors  of  the  IV  estimates  are  small  enough 
that  even  estimates  smaller  than  OLS  still  have  t-statistics  above  2. 

Finally,  Card  (1995)  shows  that  the  2SLS  estimate  might  not  be  an  estimate  of  average 
returns  to  schooling.  People  affected  by  the  instruments  might  be  people  who  have  higher 
marginal  returns  to  schooling.  This  is  the  case,  in  particular,  if  returns  to  education  are  concave 
and  individuals  with  low  levels  of  schooling  are  more  affected  than  others.  Only  people  who 
would  have  completed  less  than  primary  school  were  affected  by  the  program,  as  described  in 
section  IV.  However,  there  is  no  evidence  that  returns  to  education  are  concave  in  Indonesia. 
Estimating  nonparametrically  the  shape  of  the  true  causal  response  function  would  be  difficult, 
since  the  source  of  exogenous  variation  I  use  in  this  study  affects  only  primary  education.  But 
some  indication  that  the  returns  are  not  concave  is  given  by  OLS  estimation  using  a  dummy  for 
each  year  of  education.  These  coefficients  are  plotted  in  figure  9.  Estimated  marginal  returns 
vary  little  until  nine  years  of  education  but  are  high  for  the  twelfth  year  of  education  (the  last 
year  of  senior  high  school)  and  the  thirteenth  year  of  education.  The  high  returns  for  years 
corresponding  to  diplomas  indicate  a  "sheepskin  effect,"  found  in  the  United  States  as  well 
(Thomas  Hungerford  and  Gary  Solon  (1987),  David  A.  Jaeger  and  Marianne  E.  Page  (1996)). 
Strauss  and  Thomas  (1997)  also  found  evidence  of  apparent  convex  returns  to  education  in  urban 
Brazil.^^  If  returns  are,  in  reality,  Hnear  or  even  convex  in  developing  countries,  the  phenomenon 
of  "discount  rate  bias"  (Kevin  Lang  (1993))  emphasized  by  Card  should  not  be  present,  which 
would  again  explain  why  OLS  and  2SLS  estimates  are  similar  in  my  study. 

In  table  6,  panel  C,  I  examine  whether  returns  to  education  vary  across  regions.^'^  They  are 
higher  (11  percent)  in  sparsely  populated  regions  and  in  regions  where  the  average  education 
level  of  cohorts  not  exposed  to  the  program  is  low  (12  percent).  They  seem  to  be  lower  in  regions 
where  initial  education  was  high,  although  the  standard  error  of  this  estimate  is  too  large  to 
be  conclusive.  This  last  result  is  consistent  with  the  idea  that  the  general  equihbrium  effect  of 
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an  increase  in  education  is  to  depress  the  returns,  but  it  suggests  that  even  after  the  program, 
returns  were  still  higher  in  regions  that  received  more  schools. 

I  now  turn  to  two  potential  sources  of  bias:  the  assumption  that  the  program  had  no  impact 
on  wages  other  than  through  the  increase  in  the  quantity  of  education,  and  problems  arising 
from  sample  selection. 

C.  Could  Cheinge  in  Quality  Bias  the  2SLS  Estimates? 

As  I  discussed  in  section  III,  estimates  of  returns  to  education  are  biased  if  the  program  affects 
both  the  quality  and  the  quantity  of  education.  I  examine  two  pieces  evidence  that  suggest  that 
the  program  did  not  substantially  affect  the  quality  of  education. 

First,  I  calculated  average  pupil/teacher  ratios  in  1973/74  and  1978/79.  In  both  high-  and 
low-  program  regions,  the  average  pupil/teacher  ratio  increased  slightly  from  29.7  to  31.5  in  high- 
program  regions,  and  from  28.4  to  30.1  in  low-program  regions.  The  difference  in  differences  is 
very  close  to  0.  No  systematic  difference  in  quality,  as  measured  by  this  indicator,  is  apparent.  I 
also  ran  a  regression  of  the  change  in  pupil/teacher  ratios  on  the  number  of  schools  per  children 
built  in  the  program.  The  coefficient  is  negative,  very  small,  and  not  significantly  different  from 
0.  However,  quality  of  education  could  still  have  deteriorated  if  the  newly  hired  teachers  had 
been  less  qualified. 

Second,  I  use  the  fact  that  the  program  did  not  increase  the  education  of  people  completing 
nine  years  of  education  or  more  (as  shown  in  section  IV).  The  educational  attainment  of  these 
people  was  not  affected  by  the  program.  Therefore,  their  wages  should  not  be  affected  either. 
In  figure  10,  I  show  the  coefficients  of  the  interactions  between  the  program  intensity  and  age 
dummies  in  the  wage  and  education  equations,  in  the  sample  of  people  whose  level  of  education 
is  greater  than  9.^^  In  contrast  to  figures  1-3,  no  specific  pattern  emerges  in  either  equation:  the 
interaction  coefficients  are  fiuctuating  (they  become  negative  for  the  youngest  individuals  in  the 
education  equation)  and  there  is  no  rupture  in  trend  after  age  12.  The  evidence  in  table  6  can 
be  interpreted  along  the  same  lines.  In  densely  populated  regions  (column  4),  the  program  has 
no  effect  on  years  of  education.  If  the  quality  of  education  had  changed  and  this  had  affected 
wages,  then  I  should  see  an  effect  of  the  program  on  wages  even  in  this  region.^^ 

These  two  separate  pieces  of  evidence  lend  some  support  to  the  assumption  that  the  increase 
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in  wages  was  due  mainly  to  the  increase  in  the  quantity  of  education.  There  is  no  clear  evidence 
against  the  assumption  that  the  program  affected  only  the  quantity  of  education. 

D.  Correction  for  Sample  Selection 

The  returns  to  education  are  estimated  in  a  selected  sample:  only  45  percent  of  the  individuals 
in  the  sample  are  working  for  a  wage.  Most  remaining  individuals  are  self-employed. 

The  probability  of  working  for  a  wage  is  potentially  affected  by  education.  To  examine  this, 
I  use  2SLS  to  estimate  ^^ - 
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(18)  Wijk  =  d  +  aj  +  bk  +  SijkX  +  5I(^J  *  ^n)  ^i  +  %fc' 

1=2 

where  Wijk  is  a  dummy  variable  indicating  whether  an  individual  reports  a  positive  wage.  Es- 
timates of  this  equation  are  presented  in  table  7,  panel  Bl.  The  OLS  estimate  is  much  smaller 
than  the  IV  estimate  (the  estimate  of  the  impact  of  an  additional  year  of  education  on  the 
probability  of  working  for  a  wage  changes  from  3.3  percent  to  10.1  percent).  The  probability  of 
working  for  a  wage  is  affected  by  education. 

This  is  an  interesting  result  in  itself.  However,  this  casts  some  doubts  on  the  validity  of 
the  2SLS  estimate  of  returns  to  education.  Because  the  probability  of  working  for  a  wage  is 
also  affected  by  schooling,  the  sample  selection  is  likely  to  induce  a  correlation  between  the 
instruments  and  the  error  in  equation  (17)  (the  conditional  expectation  of  the  error  given  the 
instruments  and  the  fact  that  an  individuals  reports  a  positive  wage  may  not  be  0).  The  ideal 
solution  to  this  selection  problem  would  be  to  find  an  instrument  that  is  randomly  assigned 
in  the  population  with  positive  wages.  Failing  that,  several  econometric  solutions  have  been 
proposed.  Most  rest  on  exclusion  assumptions  that  would  be  strong  in  this  context. 

I  implement  two  alternative  procedures  to  investigate  whether  sample  selection  is  likely  to  be 
an  important  problem  in  this  case.  First,  I  implement  a  sample  correction  procedure.  Second,  I 
use  another  Indonesian  survey  to  impute  an  income  to  self-employed  individuals  in  my  sample. 
Results  are  not  very  sensitive  to  either  modification. 

First,  I  follow  a  suggestion  introduced  by  Heckman  and  Hotz  (1989),  then  elaborated  by 
Hyungtaik  Ahn  and  James  L.  Powell  (1993)  and  Angrist  (1995a),  to  condition  in  the  second 
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stage  on  the  probability  of  selection  given  the  instruments.  In  practice,  I  estimate  the  equation 

12  12 

(19)  Wijk  =  C3  +  asj  +  pzk  +  Yl^^J  *  duhzi  +  5Z(^J  *  '^'')^3l  +  eyfe, 

1=2  1=2 

and  use  the  predicted  value  for  Wijk  (the  probability  of  being  selected  given  the  instruments) 
and  the  square  of  the  predicted  value  as  additional  regressors  in  equation  (17): 
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(20)  Vijk  =  d-\-aj-\-bk-\-  Sijkb  +  ^^(Cj  *  du)  tti  +  UHjkfJ-i  +  Wijk'^l~i2  +  Vijk, 

1=2 

where  Wikj  is  the  predicted  value  from  equation  (19).  The  instruments  are  interactions  between 
year-of-birth  dummies  (for  people  12  or  younger  in  1974)  and  program  intensity  in  the  region  of 
birth. 

The  result  of  the  introduction  of  the  correction  for  sample  selection  is  presented  in  table  7, 
column  5  (panel  Al).  The  change  in  the  coefficient  is  small.  It  changes  from  8.1  percent  (in 
column  3)  to  9.2  percent.  To  check  the  sensitivity  of  this  result  to  functional  form  assumptions, 
I  have  run  similar  specifications  controlling  for  higher-order  terms  of  the  selection  probability 
(cubic  and  fourth  power),  and  the  estimates  do  not  change  much. 

I  applied  the  same  sample  correction  procedure  to  the  other  specifications.  Conventional 
estimates  and  the  selection-corrected  estimates  are  similar.  This  suggests  that  selection  bias 
does  not  have  a  big  impact  on  the  estimation  of  the  coefficient  of  education. 

A  problem  with  this  procedure  is  that  I  use  functions  of  the  year  and  region-of-birth  interac- 
tions both  as  instruments  and  as  controls.  The  second  stage  is  still  overidentified  (there  are  now 
12  instruments  for  3  parameters  to  estimate).  But  the  identification  is  fragile.  It  rests  on  the 
fact  that  I  use  several  instruments  to  measure  the  program  intensity.  This  leads,  in  particular, 
to  a  lower  F-statistic  in  the  first  stage,  and  to  larger  standard  errors  in  the  second  stage.  An 
alternative  approach  is  to  impute  an  income  for  self-employed  individuals  and  examine  whether 
the  results  change  when  the  estimation  is  performed  in  this  "completed  sample." 

To  this  end,  I  use  the  income  and  expenditure  module  of  the  1993  SUSENAS  survey.  Over 
50,000  individuals  are  included  in  this  module.  The  SUSENAS  does  not  report  the  place  of  birth 
of  the  individuals.  Households  report  the  members'  occupations  and  the  sector  of  activity  from 
which  they  derive  their  main  source  of  income.  In  addition,  the  survey  collects  information  on 
wages  received  by  each  member  of  the  household,  income  derived  from  the  sale  of  products  and 
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services  of  the  household  business  (or  farm),  and  operating  expenses  related  to  it.  I  define  the 
income  accruing  to  the  household  from  each  household  business  as  the  difference  between  sales 
and  expenses  for  this  business.  I  calculate  the  average  income  derived  from  the  main  activity 
of  the  household  for  cells  defined  by  sector  (9  industrial  sectors  and  services  and  4  types  of 
agricultural  activities),  status,  and  urban/rural  residence.  To  check  the  consistency  between  the 
two  sources,  I  report  in  table  1  the  average  monthly  income  of  wage  earners  and  their  average 
income  imputed  using  this  procedure.  The  two  figures  are  quite  close.  The  difference  is  explained 
by  the  fact  that  the  SUSENAS  was  done  in  1993  while  the  SUPAS  was  in  1995.  The  average 
monthly  income  for  self-employed  individuals  is  smaller  than  for  wage  earners. 

The  goal  of  this  exercise  is  to  examine  whether  the  results  are  sensitive  to  the  inclusion  of 
self-employed  individuals  in  the  estimation.  Therefore,  I  "complete"  the  sample  by  defining  the 
dependent  variable  as  the  logarithm  of  monthly  earnings  if  they  are  recorded  in  the  SUPAS  data 
(for  individuals  working  for  pay)  and  the  logarithm  of  the  average  income  from  the  SUSENAS  in 
the  individual's  occupation  cell  all  for  self-employed  individuals  (multiplied  by  the  wage  inflation 
factor  defined  as  the  ratio  of  the  average  wage  from  the  SUPAS  and  the  average  income  of  wage 
earners  imputed  from  the  SUSENAS  ).^^ 

The  results  are  presented  in  table  7,  panel  B2.  They  must  be  compared  to  the  results  in  panel 
A2,  where  the  dependent  variable  is  the  logarithm  of  monthly  earnings  of  wage  earners.  In  all 
c£ises,  the  estimates  using  the  completed  sample  are  smaller  than  those  using  the  sample  of  wage 
earners.  They  are  quite  close,  except  in  the  specification  controlling  for  the  water  and  sanitation 
program  (column  3),  where  it  drops  to  3.5  percent.  This  particular  result  is  surprising,  but  the 
fact  that  the  returns  for  the  complete  sample  are  somewhat  smaller  than  for  the  sample  of  wage 
earners  indicates  that  returns  to  education  might  be  higher  in  the  wage  sector  than  among  the 
self-employed. 

This  additional  evidence  supports  the  idea  that  sample  selection  is  not  an  important  problem 
in  this  application.  A  statistical  sample  selection  correction  procedure  does  not  change  the 
estimates  significantly.  Using  the  complete  sample  by  imputing  an  income  to  self-employed 
individuals  also  produces  similar  estimates  in  most  specifications. 
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VI.  Comparing  Costs  and  Benefits 

The  estimates  of  the  program's  effect  on  wages  can  be  used  to  compare  the  costs  of  building  and 
operating  the  new  schools  to  the  additional  wealth  they  generated,  under  the  assumption  that 
the  increase  in  wages  represents  an  increase  in  human  capital  and  that  the  general  equilibrium 
effects  were  not  important .^^  Note  that  in  this  case,  the  increase  in  wages  underestimates 
the  total  benefit  generated  by  the  program:  the  increase  in  education  is  likely  to  affect  other 
outcomes  (fertility,  child  morbidity  and  mortality,  etc.).^^  These  calculations  require  additional 
assumptions  and  should  be  taken  with  considerable  caution.  Nevertheless,  it  is  useful  to  evaluate 
the  magnitude  of  the  consequences  of  such  a  large-scale  program:  the  discounted  sum  of  the  cost 
of  construction  alone  from  1973  to  1979  represented  more  than  2  percent  of  Indonesia's  GDP  in 
1973. 

The  presidential  instructions  indicated  each  year  the  costs  of  building  the  new  schools  in 
each  region  and  the  number  of  teachers  to  be  allocated  in  each  school.  The  total  cost  of  building 
over  61,000  schools  reached  slightly  over  5  biUion  1990  U.S.  dollars.  Detailed  information  on  the 
cost  of  education  in  Indonesia  has  been  collected  by  Daroesman  (1971).  She  used  a  survey  of 
schools  that  she  had  conducted  and  various  administrative  sources.  I  used  her  data  on  teachers' 
salaries,  recurrent  expenditures,  and  the  costs  of  teacher  training  in  1971.  Using  my  data,  I 
then  estimated  the  average  wage  of  a  primary  school  teacher  in  1995,  and  I  interpolated  Unearly 
the  wage  between  1971  and  1995  (this  represents  an  annual  growth  slightly  higher  than  the 
growth  of  Indonesia's  GDP  over  the  period).  School  buildings  constructed  at  the  time  were 
meant  to  remain  active  for  twenty  years  (Daroesman  (1971)).^^  I  am,  therefore,  assuming  that 
the  program  lasted  for  twenty  years.  These  and  other  assumptions  are  summarized  in  table  8, 
panel  B. 

In  summary,  yearly  costs  are  calculated  using  the  following  formula:^^ 

C{t)  =r*K  +  r*TC  +  Wit)  *  1.25, 

where  K  is  the  total  capital  cost,  TC  is  the  total  training  cost,  W{t)  represents  the  sum  of 
teachers'  wage  at  date  t,  and  r  is  the  real  interest  rate  (discount  rate).  Finally,  I  present  the 
cost-benefits  analysis  for  two  different  assumptions  about  the  deadweight  burden  of  taxation 
(0.2  and  0.6). 
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Further  assumptions  are  needed  to  compute  the  yearly  benefits  of  the  program.  First,  an 
important  assumption  is  that  the  increase  in  wages  attributed  to  the  program  represents  an 
increase  in  the  productivity  of  labor  (and  that  there  is  no  general  equilibrium  effects  on  the 
returns  to  education).  Second,  I  estimated  the  effect  of  the  program  on  men  who  work  for  a 
wage.  I  assume  that  the  effect  is  the  same  on  (working)  women  and  on  self-employed  people. 
I  also  assume  that  the  share  of  total  labor  income  going  to  people  of  any  given  age  is  constant 
across  years  and  is  equal  to  the  share  of  total  wages  going  to  this  cohort  in  1995  (which  I  can 
calculate  from  my  data).  Thus,  I  estimate  the  benefit  of  the  program  at  date  t,  for  a  cohort  c 
using  the  following  formula: 

B{c,  t)  =  a*  GDP{t)  *  5(c,  t)  *  E{c), 

where  a  is  the  share  of  labor  in  GDP,  5(c,  i)  is  the  fraction  of  total  wages  earned  by  cohort 
c,  in  year  i,  and  E{c)  is  the  estimated  average  effect  of  the  program  on  cohort  c.  It  is  0  for 
people  13  or  older  in  1974  (they  were  not  exposed  to  the  program).  For  people  aged  2  to  12  in 
1974,  the  effect  is  given  by  the  coefficients  in  table  4,  multiplied  by  the  average  intensity  of  the 
program.  For  the  following  cohorts,  I  assume  that  the  effect  of  the  program  decreases  at  the 
rate  of  population  growth.'**^  To  obtain  the  total  benefits  for  each  year,  I  take  the  sum  of  these 
benefits  over  all  cohorts. 

In  figure  11,1  show  the  annual  costs  and  benefits  of  the  program  from  1973  to  2060,  evaluated 
in  millions  of  1990  U.S.  dollars.  The  assumptions  for  this  picture  are  a  GDP  growth  of  2  percent 
after  1996,  a  deadweight  burden  of  0.2,  and  a  discount  rate  of  5  percent.  The  estimates  of 
the  program  effect  are  those  from  table  4,  column  5.  Benefits  are  very  low  from  1973  to  1987 
because  the  generations  exposed  to  the  program  have  not  yet  entered  the  job  market.  After 
1989,  benefits  increase  rapidly:  each  year,  a  new  cohort  exposed  to  the  program  enters  the  job 
market.  After  2015,  when  the  generation  exposed  to  the  program  starts  leaving  the  job  market, 
annual  benefits  decrease  until  they  reach  0  by  2050  (when  the  last  cohorts  educated  in  these 
schools  leave  the  job  market).  Costs  increase  rapidly  from  1973  to  1979,  as  more  schools  are 
built  and  more  teachers  need  to  be  paid.  Once  the  total  stock  is  built,  costs  increase  only  as 
teachers'  salaries  grow.  In  1995,  when  the  schools  are  closed,  the  costs  fall  to  a  very  low  level 
corresponding  to  the  annuity  payment  on  initial  capital  expenditures.  In  1996,  benefits  exceed 
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costs. 

The  relevant  variable  for  the  cost  benefit  calculation  is  the  discounted  sum  of  net  benefits  at 
infinity  (defined  as  the  diflterence  between  yearly  benefits  and  yearly  costs).  Figure  12  plots  this 
series  (discounted  from  1973)  as  a  fraction  of  1973  GDP.  Prom  2005  on,  the  discounted  sum  of 
net  benefits  is  positive  and  increases  rapidly.  The  program's  compounded  net  benefits  in  2050 
reach  18  percent  of  1973's  GDP.^^  The  program  requires  more  than  thirty  years  to  yield  positive 
returns,  but  these  returns  are  high. 

In  table  8,  I  present  an  evaluation  of  the  program's  returns  for  the  first  two  specifications 
estimated  in  table  4  and  three  diflFerent  assumptions  about  the  growth  rate  of  the  GDP  from 
1996  to  2050.  I  present  the  discounted  sum  of  program  benefits  in  millions  of  1990  U.S.  dollars 
as  a  firaction  of  1973  GDP  and  as  a  fraction  of  costs  (calculated  by  dividing  the  discounted  sum 
of  the  program  by  the  cost  of  the  construction  of  the  new  buildings  and  training  of  the  new 
teachers).  To  evaluate  the  contribution  of  economic  growth  to  the  benefits  of  the  program,  I 
also  present  these  results  with  the  assumption  that  Indonesia's  GDP  grew  at  a  rate  of  2  percent 
annually  from  1973  to  2050. 

The  cost-benefits  analysis  is  sensitive  to  the  specification  chosen  for  the  estimation  of  the 
program  eflPect  and  to  the  assumptions  about  future  growth  rates  in  Indonesia.  Nevertheless, 
three  main  points  emerge  from  this  analysis.  First,  a  school  construction  program  takes  a  very 
long  time  to  generate  positive  returns  (because  the  costs  are  incurred  early  on,  while  the  benefits 
are  spread  over  a  generation's  Ufetime).  Second,  the  returns  generated  are  large.  By  2050,  in 
the  smallest  estimate,  the  program  wiU  have  generated  nine  times  as  much  revenue  as  its  initial 
cost.  Third,  the  benefits  are  to  a  large  extent  driven  by  the  rapid  growth  of  Indonesia's  GDP 
from  1973  to  1997  (which  results  from  the  fact  that  each  year's  benefits  are  a  fraction  of  that 
year's  GDP).  If  the  growth  rate  had  been  very  low  from  1973  until  today,  the  net  present  value 
of  the  program  would  actually  have  been  sUghtly  negative,  according  to  all  specifications  but 
one.  Therefore,  this  program  is  justified  ex  post.  Investing  in  education  is  much  more  valuable, 
from  a  government  point  of  view,  if  it  expects  a  fast  subsequent  growth.'*'^ 

The  last  three  lines  in  table  8,  panel  A  indicate  the  internal  rate  of  return  of  the  program 
(the  interest  rate  such  that  the  net  present  value  is  0).  Using  the  actual  GDP  growth  between 
1973  and  1997,  they  range  from  8.8  percent  to  12  percent,  depending  on  the  specification  and 
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hypothesis  chosen.  These  numbers  are  high,  but  reasonable.  The  evidence,  therefore,  suggests 
that  the  program  was  a  profitable  investment,  with  an  internal  rate  of  returns  substantially 
higher  than  the  average  interest  rate  on  government  debt  in  Indonesia  over  the  period.  The 
profitability  of  this  investment  would  have  been  much  less  obvious  if  Indonesia's  economic  growth 
had  been  slower. 

VII.  Conclusion 

The  INPRES  program  of  primary  school  construction  led  to  an  increase  in  educational  attain- 
ment in  Indonesia.  The  estimates  of  the  effect  of  the  program  on  the  education  of  children  aged 
2  to  6  in  1974  range  from  0.12  to  0.19  years  for  each  new  school  built  per  1,000  children.  In 
particular,  it  has  encouraged  a  significant  proportion  of  the  population  to  complete  more  years 
of  primary  education.  This  increase  has  translated  into  an  increase  in  wages  of  1.5  to  2.7  percent 
for  each  additional  school  built  per  1,000  children.  Estimates  of  economic  returns  to  education 
using  this  exogenous  variation  in  schooling  (assuming  that  the  program  had  no  other  efltect  than 
to  increase  the  quantity  of  education)  range  from  6.8  percent  to  10.6  percent.  These  numbers 
can  be  interpreted  as  weighted  averages  of  returns  to  basic  education  for  people  who  are  affected 
by  the  instruments,  a  group  that  is  hkely  to  include  the  poorest  segment  of  the  population. 

The  findings  reported  here  are  important  because  they  show  that,  in  Indonesia,  an  unusually 
large  government-administered  intervention  has  been  eflFective  in  increasing  both  education  and 
wages.  This  intervention  was  meant  to  increase  the  quantity  of  education  (measured,  in  the 
INPRES  instructions,  by  enrollment  rates) .  It  is  sometimes  feared  that  the  deterioration  in  the 
quality  of  education  that  could  result  from  this  type  of  program  could  offset  any  gain  in  quantity. 
However,  the  program  was  effective  in  increasing  not  only  education  levels,  but  also  wages.  This 
suggests  that  the  combined  effect  of  quality  and  quantity  changes  in  education  was  to  increase 
human  capital.  I  presented  some  evidence  that  the  quality  of  education  does  not  seem  to  have 
deteriorated  significantly  because  of  the  program.  But  even  if  it  has,  the  effect  on  wages  shows 
that  this  decline  was  not  sufficient  to  oflFset  the  impact  of  the  increase  in  quantity. 

I  presented  evidence  in  favor  of  the  internal  validity  of  these  results.  I  have  shown  that 
changes  between  cohorts  were  not  systematically  diflFerent  in  low-  and  high-program  regions 
before  the  program  started,  and  I  have  tried  to  control  for  the  two  variables  (the  water  and 
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sanitation  program  and  the  enrollment  rate  in  1971)  whose  omission  was  most  likely  to  bias  the 
estimates.  It  remains  possible  that  these  results  cannot  be  generalized  to  different  situations.  In 
particular,  the  emphasis  on  education  in  Indonesia  at  the  time  of  the  program  created  a  context 
particularly  favorable  to  its  success. 

The  cost-benefits  analysis  suggests  that  the  cost  of  the  program  was  smaller  than  the  dis- 
counted lifetime  gains  in  the  wages  of  the  generation  who  were  exposed  to  it.  It  should  be 
noted,  however,  that  this  alone  does  not  tell  us  that  the  policy  should  have  been  implemented  or 
financed  by  the  government.  To  answer  this  question,  we  need  to  determine  whether  the  levels 
of  education  would  have  been  below  the  social  optimum  in  the  absence  of  the  program.  This 
would  be  the  case  if  altruism  were  imperfect  within  the  family  (so  that  parents,  who  in  practice 
make  educational  choices,  do  not  extract  the  full  benefit  of  it),  if  households  were  facing  credit 
constraints,  or  if  there  were  positive  spillovers  of  education.  Answering  these  questions  is  beyond 
the  scope  of  this  paper,  but  future  research  could  use  the  variations  in  education  induced  by 
the  INPRES  program  to  estimate  the  impact  of  education  on  other  outcomes  (fertihty  and  child 
health  for  example).  Future  research  could  also  address  the  question  of  the  social  returns  to 
education  by  examining  whether  the  income  of  the  cohorts  not  exposed  to  the  program  increased 
faster  over  time  in  regions  where  the  program  caused  increases  in  the  education  of  the  younger 
cohorts. 
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•^Kristin  F.  Butcher  and  Anne  Case  (1994)  use  sibUng  composition  as  an  instrument  for 
women's  education  in  the  United  States.  Ashish  Garg  and  Jonathan  Morduch  (1998)  find  that 
sibhng  composition  affects  education  in  poor  households  in  Ghana,  but  they  also  find  evidence 
of  an  impact  of  sibhng  composition  on  health,  which  is  likely  to  affect  income  directly. 

^Maluccio  (1998)  uses  a  sample  of  250  wages  earners  in  the  Philippines  and  relies  on  distance 
to  school  reported  by  the  parents  in  1978  as  an  instrument  for  education  of  the  individuals 
interviewed  in  1994. 

^The  matching  was  compUcated  by  the  fact  that  some  districts  changed  boundaries  or  name. 
I  used  maps  of  Indonesia  to  resolve  these  ambiguities. 

^This  share  does  not  include  teachers'  salaries,  which  are  paid  out  of  the  routine  component 
of  the  budget. 

^The  program  did  not  stop  at  that  date.  I  chose  to  consider  the  construction  between  1973- 
74  and  1978-79  for  the  following  reasons:  it  corresponds  to  the  duration  of  the  second  five-year 
plan,  a  very  high  primary  enrollment  rate  was  achieved  in  1978,  and  people  born  in  1972  (the 
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youngest  cohort  in  my  sample)  turned  7  in  1979  and  were  therefore  fully  exposed  to  the  program. 
The  program  slowed  down  considerably  after  1978-79. 

^I  chose  1971  as  a  base  year  for  the  population  of  children  because  1971  was  a  census  year. 
In  census  publications,  the  district  population  is  broken  down  by  wide  age  categories,  so  I  have 
to  use  children  aged  5-14  as  the  reference  group,instead  of  children  aged  7-12. 

'^The  transmigration  regions  are  the  areas  where  the  government  of  Indonesia  encouraged 
new  settlement  as  a  solution  to  the  overcrowding  of  Java^ 

^The  1971  census  publications  do  not  give  enrollment  rates  among  children  aged  7  to  12,  but 
only  the  fraction  of  the  overall  population  attending  school. 

^To  correct  for  measurement  error  in  enrollment  rates,  I  use  an  uncorrelated  measure  of 
regional  education  as  an  instrument  for  enrollment  rate  in  the  population  in  1971:  the  average 
education  of  individuals  born  between  1950  and  1960. 

^°Age  at  school  entry  varies  in  developing  countries  (Paul  Glewwe  and  Hanan  Jacoby  (1995)) 
and  grade  repetition  is  important  in  Indonesia  (Jere  R.  Behrman  and  Anil  B.  Deolalikar  (1991)). 
In  table  1,  panel  C,  I  show  information  from  the  Indonesian  Family  Life  Survey  (IFLS)  data  set: 
20  percent  of  children  have  repeated  at  least  one  grade.  I  estimated  the  number  of  people  stiU 
attending  primary  school  after  age  12  and  13  in  the  IFLS,  by  combining  information  of  age  when 
the  individual  left  school  with  the  information  about  grade  repetition.  At  age  13  or  above  16 
percent  of  students  were  still  in  primary  school,  but  only  7  percent  were  still  in  primary  school 
a  year  later. 

^^If  I  had  data  on  region  of  education,  I  could  use  region  of  birth  as  an  instrument  for  region 
of  education. 

•^^To  make  Wald  estimates  meaningful,  estimates  in  table  3  are  presented  for  the  sample  with 
valid  wage  data.  High  program  regions  are  defined  as  regions  where  the  residual  of  a  regression 
of  the  number  of  schools  on  the  number  of  children  is  positive. 

^^See  Rosenzweig  and  Wolpin  (1988a),  Pitt,  Rosenzweig  and  Gibbons  (1993),  Gertler  and 
Molyneaux  (1994). 
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^^ James  J.  Heckman,  Lance  Lochner  and  Christopher  Taber  (1998)  make  this  point  in  their 
analysis  of  the  eflFects  of  tuition  pohcy  in  the  United  States. 

^^Card  (1995)  assumes  concave  returns  to  schoohng.  I  abstract  from  this  to  focus  on  the  most 
important  assumptions  underlying  the  identification  strategy  in  this  context. 

^^See  Joshua  Angrist  (1995b)  for  an  analysis  of  general  equilibrium  effects  of  a  rapid  increase 
in  education,  Jere  R.  Berhman  and  Nancy  Birdsall  (1983)  and  Card  and  Krueger  (1992)  for  the 
impact  of  school  quality  on  returns  to  education  and  Andrew  D.  Foster  and  Mark  R.  Rosenzweig 
(1996)  for  the  influence  of  economic  growth  on  returns  to  education. 

^^This  implies  that  Sj  =  l/2(5jo  +  Sjk)-  Extending  the  calculations  to  several  cohorts  is 
conceptually  easy  but  makes  notation  cumbersome. 

^^Using  equation  (2)  and  imposing  the  assumption  that  the  old  cohort  did  not  anticipate 
the  program,  I  obtain  EoSjk  =  EqSjo  —  ^,  where  Di  =  Eo{tJ-jk  —  fJ-jo)-  Replacing  this  in  the 
expected  value  of  equation  (4)  and  substituting  Eobjo  in  equation  (2),  I  obtain  {(f)  —  2/3i)5jo  = 
—aiZjQ  +  PsQjo  +  2j32EqS  +  D2  with  D2  =  EqVj  —  ^^  —  ^jq.  I  can  then  combine  this  with 
equation  (2)  to  obtain  {(l)-Pi){Sjk-Sjo)  =  -ai{Zjk- Zjq)+  Pz{qjk-qjo)-\-W2{EkS - EqS)+ Dz, 
with  £13  =  EkVj  -  EoVj  -  {fijk  -  ^ijo)  +  ^^ 

^^The  potential  decline  in  the  returns  to  education  in  regions  that  received  more  schools 
affects  all  cohorts,  whether  or  not  they  were  exposed  to  the  program.  Therefore,  I  will  observe 
a  (spurious)  negative  relationship  between  yjk  —  yjo  and  the  program  for  A;  >  13. 

^°This  is  equivalent  to  a  weighted  least  squares  estimation  of  equation  (6). 

^^  The  number  of  schools  per  child  was  negatively  correlated  with  the  number  of  children, 
and  I  want  to  abstract  from  any  time-varying  effect  of  region  population.  It  turns  out  to  be 
important  in  the  wage  equation. 

^^  Reverse  causality  (improvement  in  performance  causing  improvement  in  inputs)  is  not  a 
potential  problem  in  this  case  (unlike  in  Gertler  and  Molyneaux  (1994)  and  in  Pitt  et  al.  (1993)), 
because  during  the  entire  period,  the  allocation  rule  was  explicitly  based  upon  enrollment  in 
1972  (i.e.,  before  the  INPRES  program). 
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■^^Using  enrollment  in  primary  schools  in  1973  divided  by  children's  population  gives  similar 
results,  but  this  variable  is  available  for  fewer  provinces  than  the  enrollment  rate  in  1971.  I  also 
ran  the  same  regressions  controUing  in  addition  for  average  education  in  the  cohort  1950-1960 
(which  corresponds  more  closely  to  the  framework  outlined  in  section  III),  and  found  similar 
results. 

^'*I  have  plotted  the  coefficients  corresponding  to  the  specification  in  table  Al,  column  2. 

^^The  government  may  care  more  about  those  children  who,  in  the  absence  of  the  program, 
would  have  had  the  least  education.  However,  I  cannot  really  answer  this  question  by  examining 
how  the  distribution  of  education  is  affected  by  the  program  (because  the  difference  in  quantiles 
is  not  the  quantile  of  the  difference). 

^^The  shape  of  this  function  is  not  affected  by  controlling  for  interactions  of  enrollment  rate 
(or  water  and  sanitation  program)  and  year-of-birth  dummies. 

^^The  negative  difference  in  differences  at  the  senior  high  school  level  may  indicate  that  some 
variable  predicting  the  probability  of  attending  senior  high  school  is  omitted  from  this  regression 
(and  changed  in  low-program  regions  more  than  in  high  program  regions). 

■^^For  example,  Angrist  and  Imbens  (1995)  find  that  compulsory  attendance  laws  in  the  United 
States  induce  a  fraction  of  the  sample  to  complete  some  college  as  a  consequence  of  constraining 
them  to  complete  high  school. 

■^^I  have  also  estimated  this  equation  using  LIML,  which  is  median  unbiased  and  does  not 
suffer  from  this  problem.  LIML  results  are  very  similar  to  2SLS  results,  providing  evidence  that 
the  weak  instruments  problem  is  not  serious  in  this  application. 

^°For  example,  in  a  study  on  returns  to  education  in  Indonesia,  Behrman  and  Anil  B.  Deola- 
likar  (1993)  introduce  household  fixed  effects  in  an  earnings  function  and  report  estimates  that 
are  much  lower  than  the  corresponding  OLS  estimates.  Note,  however,  that  introducing  fixed 
effects  into  a  wage  equation  reduces  the  sample  size  and  might  exacerbate  the  attenuation  bias 
due  to  measurement  errors  (Zvi  Grihches  (1979)). 
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Note  that  the  OLS  coefficients  indicate  a  correlation  and  are  not  causal  estimates;  these 
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high  coefficients  could  be  an  artifact  of  selection  into  secondary  education. 

^■^I  have  not  presented  the  2SLS  estimate  when  the  F-statistic  for  the  joint  significance  of  the 
instruments  in  the  first  stage  was  below  2,  because  it  would  not  be  interpretable. 

^^Note  that  I  partition  the  sample  according  to  education,  which  is  an  endogenous  variable. 
It  not  ideal,  but  it  is  not  likely  to  be  a  real  problem  in  this  case,  because  the  decision  to  go 
to  senior  high  school  is  quite  different  from  the  decision  to  complete  one  more  year  of  primary 
school. 

^^The  data  shows  that  the  program  reduced  pupil/teacher  ratios  in  densely  populated  regions, 
which  is  expected  given  that  it  increased  the  number  of  teachers  but  not  the  number  of  pupils 
in  these  regions.  This  suggests  that  pupil/teacher  ratios  did  not  have  a  big  impact  on  effective 
school  quality. 

^^Individuals  who  did  not  work  at  least  one  hour  in  the  previous  week  do  not  report  a  branch 
of  activity.  They  are,  therefore,  still  excluded  from  this  sample. 

^^The  loss  incurred  by  the  older  generations,  whose  returns  to  education  went  down  because 
of  the  program  are  not  taken  into  account  in  the  reduced  form  estimates  of  the  program's  effects. 

^^More  generally,  the  cost-benefits  analysis  should  ideally  be  based  on  the  consumer  surplus 
generated  by  the  program.  It  includes  the  additional  cost  for  children  to  go  to  school  as  well  as 
the  benefits  for  all  children  to  go  to  a  school  that  is  now  closer  to  their  home. 

^^And  indeed  in  1997,  most  INPRES  schools  constructed  in  the  mid-1970's  were  either  closed 
or  crumbling. 

^^The  assumption  that  the  government  pays  an  annuity  on  the  capital  and  training  expendi- 
tures and  pays  the  wage  biU  every  year  is  of  course  irrelevant  for  the  calculation  of  the  discounted 


sum  of  net  benefits  or  the  internal  rate  of  returns 


■^^So  for  a  cohort  born  y  years  after  1972,  the  effect  would  be  calculated  as  E{c)  =  Tj^i^  if 


the  population  growth  rate  were  constant. 

■^^ After  2050,  the  benefits  are  stable,  and  I,  therefore,  consider  2050  as  the  infinity  point. 
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^■^A  related  point  is  made  by  Mark  Bils  and  Peter  Klenow  (1998),  who  argue  that  the  corre- 
lation between  growth  and  education  in  cross-country  growth  regressions  is  likely  to  be  driven 
by  the  fact  that  high  expected  growth  makes  investment  in  education  more  profitable. 
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A1 :  Experiment  of  interest;  education 


A2:  Experiment  of  interest:  log(wages) 
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B1:  Control  experiment:  education 
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B2:  Control  experiment:  log(wages) 
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Figure  1:  Regional  growth  in  education  and  log  wages  accross  cohort  and  program  intensity 

(Per  capita  denotes  per  1000  children) 
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FIGURE  2  -  COEFFICIENT  OF  THE  INTERACTIONS  AGE  IN  1974*  PROGRAM 

INTENSITY  IN  THE  REGION  OF  BIRTH 

IN  THE  EDUCATION  EQUATION 


Age  In  1874 


FIGURE  3A  -  COEFFICIENTS  OF  THE  INTERACTION  AGE  IN  1974*  PROGRAM  INTENSITY 

IN  THE  REGION  OF  BIRTH 
IN  THE  WAGE  AND  EDUCATION  EQUATIONS 


FIGURE  3B  -  COEFFICIENTS  OF  THE  INTERACTION  AGE  IN  1974*  PROGRAM  INTENSITY  IN 

THE  REGION  OF  BIRTH 
IN  THE  WAGE  AND  EDUCATION  EQUATIONS  (SMOOTHED) 
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RGURE  4  -  CDF  OF  EDUCATION,  HIGH  PROGRAM  REGION 


FIGURE  5  -  CDF  OF  EDUCATION,  LOW  PROGRAM 
REGION 
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HGURE  6  -  BETWEEN  COHORT  DIFFERENCES  IN  CDF  OF 
EDUCATION 
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RGURE  7  -  DIFFERENCE  IN  DIFFERENCES  IN  CDF  (WITH  95% 
CONRDENCE  INTERVAL) 
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RGURE  8  -  DIFFERENCE  IN  DIFFERENCES  IN  CDF  (ESTIMATED 

FROM  UNEAR  PROBABILITY  MODEL)  WITH  95%  CONRDENCE 

INTERVAL 
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FIGURE  9  -  COEFFICIENTS  OF  THE  INTERACTIONS  AGE  IN  1974*  PROGRAM  INTENSITY  IN  THE  REGION 
OF  BIRTH  IN  THE  WAGE  AND  EDUCATION  EQUATION  (SAMPLE:  INDIVIDUALS  WHO  COMPLETED  MORE 

THAN  9  YEARS  OF  EDUCATION) 
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FIGURE  10  -  RETURNS  TO  EACH  YEAR  OF  EDUCATION  (OLS  ESTIMATE) 


FIGURE  11  -  ANNUAL  COSTS  AND  BENEFITS  OF  THE  PROGRAM 
(MILLION  1990  US  DOLLARS) 
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FIGURE  12  -  DISCOUNTED  SUM  OF  PROGRAM'S  NET  BENEFIT 
(MILLION  1990  US  DOLLARS) 
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Assumptions:  Deadweight  burden:  0.2.  GDP  growth  after  1973:  2%.  Program 
effect  is  estimated  without  controlling  for  enrolment  rate  (table  5,  columns  7). 


TABLE  1  --  DESCRIPTIVE  STATISTICS 


Panel  A:  Individual  Level  Means 


Mean 


Education  (whole  sample  N=  1 52,989)  7.98 

Education  (sample  with  valid  wage  data  N=60,663)                     "^      "  9.00 

INPRES  schools  built  for  1 000  children  1 .98 

INPRES  schools  built  for  1000  children  (sample  with  valid  wage  data)  1.89 

INPRES  schools  built  for  1000  children  (High  program  regions)  2.44 

INPRES  schools  built  for  1000  children  (Low  program  regions)  1.54 

Log(hourly  wage)  6.87 

Monthly  earnings  (SUPAS  1995),thousands  Rupiah  13 

Monthly  earnings  (SUSENAS  1993)  of  wage  earners,  thousands  Rupiah  205 

Monthly  earnings  (SUSENAS  1993)  of  self-employed  individuals,  thousands  Rupiah  152 

Panel  B:  District  Level  Means  (N=293) 

INPRES  schools  constructed  (1973/74  -  1978/79)  222 

INPRES  schools  constructed  for  1000  children  (1973/74  -  1978/79)    '  2.34 

Number  of  teachers  in  1973/74  1530 

Number  of  teachers  in  1978/79  2082 

Number  of  schools  in  1973/74  219 

Fraction  of  the  population  attending  schools  in  1971  (census)  0.174 

Enrollment  rate  in  primary  school  in  1973  (Ministry  of  education  and  culture)  0.68 


Panel  C:  Indonesian  Family  Life  Survey,  Individuals  Born  Between  1950  and  1972 

(all  numbers  are  in  percentage) 

Proportion  of  individuals  having  migrated  between  birth  and  age  12  8.5 

Proportion  of  people  having  repeated  at  least  one  grade  in  primary  school  20 

Proportion  of  people  completin  more  than  primary  having  repeated  at  least  one  grade  in  primary  school  6.0 

Proportion  of  individuals  having  attended  primary  school  after  age  12  (estimated)  15.8 

Proportion  of  individuals  having  attended  primary  school  after  age  13  (estimated)  6.8 

Proportion  of  individuals  born  1950-1961  ,  completing  primary  or  less,  who  left  school  after  1974  2.8 

Proportion  of  individuals  born  1962-1966  ,  completing  primary  or  less,  who  left  school  after  1974      24.5 

Sources:  IFLS,  SUPAS,  SUSENAS,  INPRES  instruction,  Census  (1971),  Ministry  of  Education  and  Culture. 


TABLE  2  -  THE  ALLOCATION  OF  SCHOOLS 


log(INPRES  schools) 

Log  of  number  of  children  aged  5-14  in  the  region  0.78 

(0.027) 

Log(l -enrollment  rate  in  primary  school  in  1973)  0.12 

(0.038) 

Number  of  observations  255 

R  squared 0.78 

Notes:  Standard  errors  are  in  parentheses. 

The  dependent  variable  in  the  log  of  the  number  of  ENPRES  schools  built 
between  1973  and  1978. 

The  enrollment  rate  in  primary  school  is  the  number  of  children  enrolled  in 
primary  school  in  1973  (obtained  from  the  Ministry  of  education  and  Culture) 
divided  by  the  number  of  children  aged  5-14  in  the  region  in  1973 


TABLE  3  --  MEANS  OF  EDUCATION  AND  LOG(WAGE)  BY  COHORT  AND  LEVEL  OF  PROGRAM  CELLS 


Years  of  education 


Log(wages) 


Level  of  program  in 
Region  of  birth 


Level  of  program  in 
Region  of  birth 


High 


Low 


Difference 


High 


Low 


Difference 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


Panel  A:  Experiment  ( 

3f Interest 

Aged  2  to  6  in  1974 

8.49 

9.76 

-1.27 

6.61 

6.73 

-0.12 

(0.043) 

(0.037) 

(0.057) 

(0.0078) 

(0.0064) 

(0.010) 

Aged  12  to  17  in  1974 

8.02 

9.40 

-1.39 

6.87 

7.02 

-0.15 

(0.053) 

(0.042) 

(0.067) 

(0.0085) 

(0.0069) 

(0.011) 

Difference 

0.47 

0.36 

0.12 

-0.26 

-0.29 

0.026 

(0.070) 

(0.038) 

(0.089) 

(0.011) 

(0.0096) 

(0.015) 

Panel  B:  Control  Experiment 


Aged  12  to  17  in  1974 


Aged  18  to  24  in  1974 


Difference 


8.00 

9.41 

-1.41 

6.87 

7.02 

-0.15 

(0.054) 

(0.042) 

(0.078) 

(0.0085) 

(0.0069) 

(0.011) 

7.70 

9.12 

-1.42 

6.92 

7.08 

-0.16 

(0.059) 

(0.044) 

(0.072) 

(0.0097) 

(0.0076) 

(0.012) 

0.30 

0.29 

0.013 

0.056 

0.063 

0.0070 

(0.080) 

(0.061) 

(0.098) 

(0.013) 

(0.010) 

(0.016) 

Note:  The  sample  is  made  of  the  individuals  who  earn  a  wage.  Standard  errors  are  in  parentheses 
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TABLE  6  --  EFFECT  OF  EDUCATION  ON  LABOR  MARKET  OUTCOMES.  OLS  AND  2SLS  ESTIMATES 


Method 


Instrument 


(1) 


(2) 


(3) 


(4) 


PANEL  A:  Sample  of  Individuals  Who  Work  for  a  Wage 

PANEL  Al:  Dependent  variable:  Iog(hourly  wage) 
OLS 

2SLS     Year  of  birth  dummies*  program  intensity  in  region  of  birth 


2SLS     (Aged  2-6  in  1974)*program  intensity  in  region  of  birth 


0.0776 
(0.000620) 

0.0675 

(0.0280) 

[0.96] 

0.0752 
(0.0338) 


0.0777  0.0767 

(0.000621)       (0.000646) 


0.0809 

(0.0272) 

[0.9] 

0.0862 
(0.0336) 


0.106 

(0.0222) 

[0.93] 

0.104 
(0.0304) 


0.0908 

(0.0541) 

[0.9] 


PANEL  A2:  Dependent  variable:  log(monthly  earnings) 

OLS 


0.0698 
(0.000601) 


2SLS     Year  of  birth  dummies*  program  intensity  in  region  of  birth  0.0756 

(0.0280) 
[0.73] 

PANEL  B:  Complete  Sample 


0.0698  0.0689 

(0.000602)       (0.000628) 


0.0925 

(0.0278) 

[0.63] 


0.0913 

(0.0219) 

[0.58] 


0.134 

(0.0631) 

[0.7] 


PANEL  Bl:  Dependent  variable:  participation  in  the  veage  sector 
OLS 


2SLS     Year  of  birth  dummies*  program  intensity  in  region  of  birth 


OLS 


2SLS     Year  of  birth  dummies*  program  intensity  in  region  of  birth 


Control  variables: 

Year  of  birth*enrollment  rate  in  1971 
Year  of  birth*  water  and  sanitation  program 
Propensity  score,  propensity  score  squared 


0.0328 
(0.00311) 

0.0327 
(0.000311) 

0.0337 
(0.000319) 

0.101 

(0.0210) 

[0.66] 

0.118 

(0.0197) 

[0.93] 

0.0892 

(0.0162) 

[1.12] 

elf-employed  individuals 

0.0539                0.0539 
(0.000354)           (0.000354) 

0.0539 
(0.000355) 

0.0509 

(0.0157) 

[0.68] 

0.0745 

(0.0136) 

[0.58] 

0.0346 

(0.0138) 

[L16] 

No 

No 
No 

Yes 

No 
No 

Yes 
Yes 

No 

Yes 

No 

Yes 


Notes:  Year  of  birth  dummies,  region  of  birth  dummies  and  the  interactions  betweeen  year  of  biith  dummies  and  the  number  of  children  in  the  region 
of  birth  in  1971  are  included  in  the  regressions.  Standard  errors  are  in  parenthesis.  F-statistics  of  the  test  of  overidentification  restrictions  are  in 
squared  brackets. 
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TABLE  8  --  EVALUATION  OF  THE  PROGRAM'S  NET  RETURN 


Deadweight  loss  coefficient 
0.2  0.6 


(1)  (2) (3)  (4) 


PANEL  A:  Results 

Control  for  Year  of  birth*enrollnient  rate  No  Yes  No  Yes 

First  year  where  benefit>costs  (discount  rate=5%) 

In  annual  value:  1996  1996  1997  1997 

In  discounted  sum  2005  2002  2009  2005 

Discounted  sum  of  net  benefits  in  2050  (g  after  1997=5%,  discount  rate  5%) 

In  million  1990  US$                                                         13,025          13,096  11,340  18,807 

As  a  fraction  of  Indonesia's  GDP  in  1973                            0.30              0.36  0.31  0.52 

Divided  by  initial  costs                                                        24.1              24.2  21.0  35.0 

Discounted  sum  of  net  benefits  in  2050  (g  after  1997=2%,  discount  rate  5%) 

In  milllion  1990  US$                                                      6,691          11,589  5,008  9,905 

As  a  fraction  of  Indonesia's  GDP  in  1973                            0.18              0.32  0.14  0.27 

Divided  by  initial  costs                                                        12.4              21.4  9.26  18.3 

Discounted  sum  of  net  benefits  in  2050  (g  from  1973=  2%,  discount  rate  5  %) 

In  million  1990  US$                                                         -631.6             1200  -2315  -483 

As  a  fraction  of  Indonesia's  GDP  in  1973                        -0.017           0.033  -0.063  -0.013 

Divided  by  initial  costs                                                    -1.16             2.22  -4.28  -0.89 

Internal  rate  of  return 

g  after  1997=5%  0.102  0.118  0.0895  0.105 

g  after  1997=2%  0.088  0.106  0.0750  0.0915 

g  from  1973=2%  0.0443  0.059  0.0326  0.0467 

PANEL  B:  Assumptions  and  Parameters 

Population  growth  rate  after  1997  0.015 

Yearly  teacher  salary  in  1973  (1990  US  $)  363 

Yearly  teachers  salary  in  1995  (1990  US  $)  2,467 

Total  recurrent  costs/teacher  salary  1.25 

Total  cost  of  construction  (million  1990  US  $)  522 

Number  of  school  constructed  61,800 

Life  time  of  the  schools  (years)  20 

Share  of  labor  income  in  GDP OJ 

Note:  The  estimates  underlying  these  calculations  are  taken  from  table  5  (columns  7  and  8). 

Program  effect  has  been  set  to  0  for  children  aged  7  or  older  in  1974. 

The  internal  rate  of  return  is  the  interest  rate  such  that  the  net  present  value  of  the  project  at  infinity  is  0. 


TABLE  A1  --  UNRESTRICTED  ESTIMATES  OF  PROGRAM  EFFECT:  COEFFICIENTS  OF  THE  INTERACTION  BETWEEN 
AGE  IN  1974  AND  THE  NUMBER  OF  SCHOOLS  PER  1000  CHILDREN  BUILT  IN  THE  REGION  OF  BIRTH 


Education 

Log(wage) 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

Age  in  1974 

23 

-0.072 

-0.83 

-0.15 

-0.007 

-0.012 

-0.013 

(0.074) 

(0.077) 

(0.086) 

(0.023) 

(0.023) 

(0.026) 

22 

0.012 

-0.0024 

-0.035 

0.0024 

0.0039 

0.13 

(0.068) 

(0.071) 

(0.079) 

(0.022) 

(0.022) 

(0.025) 

21 

0.067 

0.059 

0.033 

0.017 

0.019 

0.027 

(0.066) 

(0.070) 

(0.077) 

(0.021) 

(0.021) 

(0.024) 

20 

0.039 

0.014 

-0.038 

-0.015 

-0.0052 

-0.02 

(0.066) 

(0.070) 

(0.077) 

(0.021) 

(0.021) 

(0.024) 

19 

0.048 

0.051 

0.03 

0.02 

0.24 

0.045 

(0.061) 

(0.065) 

(0.072) 

(0.021) 

(0.021) 

(0.023) 

18 

0.011 

0.032 

-0.003 

-0.0032 

0.0027 

0.022 

(0.067) 

(0.070) 

(0.077) 

(0.021) 

(0.021) 

(0.024) 

17 

0.087 

0.068 

0.031 

0.0370 

0.01 

0.023 

(0.065) 

(0.069) 

(0.076) 

(0.021) 

(0.021) 

(0.023) 

16 

-0.021 

-0.007 

-0.081 

0.0012 

0.0043 

0.015 

(0.062) 

(0.066) 

(0.073) 

(0.020) 

(0.020) 

(0.023) 

15 

-0.03 

-0.025 

-0.086 

-0.01 

-0.0091 

0.0047 

- 

(0.063) 

(0.067) 

(0.074) 

(0.020) 

(0.020) 

(0.023) 

14 

0.081 

0.088 

0.072 

0.019 

0.023 

0.052 

(0.058) 

(0.062) 

(0.069) 

(0.019) 

(0.019) 

(0.022) 

13 

0.064 

0.071 

0.014 

-0.0057 

0.000054 

0.016 

(0.069) 

(0.071) 

(0.079) 

(0.022) 

(0.022) 

(0.024) 

12 

-0.0067 

0.0024 

-0.009 

0.019 

0.025 

0.045 

(0.064) 

(0.067) 

(0.075) 

(0.020) 

(0.020) 

(0.023) 

11 

0.039 

0.052 

0.0073 

-0.011 

-0.0067 

0.0095 

(0.062) 

(0.066) 

(0.073) 

(0.019) 

(0.020) 

(0.022) 

10 

0.087 

0.077 

0.067 

0.001 

0.011 

0.024 

(0.064) 

(0.067) 

(0.075) 

(0.020) 

(0.020) 

(0.023) 

9 

0.17 

0.17 

0.14 

0.012 

0.016 

0.036 

(0.058) 

(0.062) 

(0.068) 

(0.019) 

(0.019) 

(0.021) 

8 

0.12 

0.13 

0.1 

0.022 

0.027 

0.045 

(0.065) 

(0.068) 

(0.075) 

(0.020) 

(0.020) 

(0.023) 

7 

0.15 

0.17 

0.15 

-0.0068 

0.0013 

0.025 

(0.062) 

(0.066) 

(0.073) 

(0.019) 

(0.020) 

(0.022) 

6 

0.17 

0.2 

0.24 

0.014 

0.02 

0.036 

(0.060) 

(0.064) 

(0.071) 

(0.019) 

(0.019) 

(0.022) 

5 

0.13 

0.16 

0.12 

0.024 

0.029 

0.07 

(0.061) 

(0.064) 

(0.072) 

(0.020) 

(0.020) 

(0.023) 

4 

0.14 

0.15 

0.16 

0.021 

0.027 

0.056 

(0.058) 

(0.062) 

(0.069) 

(0.019) 

(0.019) 

(0.022) 

3 

0.14 

0.17 

0.19 

0.011 

0.019 

0.045 

(0.062) 

(0.066) 

(0.074) 

(0.020) 

(0.020) 

(0.024) 

2 

0.17 

0.22 

0.18 

0.018 

0.029 

0.058 

(0.059) 

(0.063) 

0.071 

(0.019) 

(0.020) 

(0.023) 

Year  of  birth*  1971  enrollment  rate 

No 

Yes 

Yes 

No 

Yes 

Yes 

Year  of  birth*  water  and  sanitation 

No 

No 

Yes 

No 

No 

Yes 

R  squared 

0.19 

0.19 

0.17 

0.14 

0.15 

0.14 

Observations 

152,989 

152,495 

143,107 

60,208 

60,466 

55,144 

Notes:  Standard  errors  are  in  parentheses. 

All  regressions  include  yeai 

■  of  birth  dummies,  region 

,  of  birth  dummies. 

and  year  of  birth  dummies  interacted  with  number  of  children  in  197 1 

.  The  omitted 

group  is  the 

;  group  aged  24  in  1974. 
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